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Abstract: Gaussian mixture models arc widely used to study clustering problems. These 
model-based clustering methods require an accurate estimation of the unknown data density 
by Gaussian mixtures. In Maugis and Michel (2009), a penalized maximum likelihood estima- 
tor is proposed for automatically selecting the number of mixture components. In the present 
paper, a collection of univariate densities whose logarithm is locally /3-H61der with moment 
and tail conditions arc considered. We show that this penalized estimator is minimax adaptive 
to the /3 regularity of such densities in the Hellinger sense. 
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1. Introduction 

Clustering methods consists of discovering clusters among observations. Many cluster analysis meth- 
ods have been proposed in statistics and learning theory, roughly fall into three categories. The first 
one is based on similarity or dissimilarity distances, the best-known are partitioned clustering meth- 
ods as k-means and the hierarchical clustering methods (see for instance Sections 14.3.6 and 14.3.12 
in Hastie et al., 2009). The second category consists of density level set clustering methods which 
consider clusters as the connected components of high density regions (see Hartigan, 1975). The 
third category is composed of model-based clustering methods which define clusters as observations 
having most likely the same distribution. In this last case, each subpopulation is assumed to be 
distributed from a parametric density, like a Gaussian one and thus the unknown data density is 
a mixture of these distributions (see for instance McLachlan and Peel, 2000). The data clustering 
is then deduced thanks to the maximum a posteriori (MAP) rule. The clustering problem being 
based on data density estimation, it is then essential that this density be efficiently estimated. 

Because of their wide range flexibility, Gaussian mixture densities are widely used to model 
the unknown distribution of continuous data for clustering analysis (see for instance Lindsay, 1995; 
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McLachlan and Peel, 2000). By recasting the clustering problem into a model selection problem, we 
have proposed in Maugis and Michel (2009) a non asymptotic penalized criterion. We proved that 
the selected Gaussian mixture estimator fulfills an oracle inequality. The aim of this new paper is to 
investigate the adaptive properties of this estimator in order to justify the validity of our clustering 
method. More precisely, adapting a recent approximation result, we show that our estimator is 
minimax adaptive to the regularity parameter of a particular class of Holder spaces defined further. 
As far as we know, such a minimax adaptive result has never been shown for a density estimator 
used for model-based clustering methods. 

We first recall the context of Maugis and Michel (2009) in the unidimcnsional case. Let us con- 
sider n independent identically distributed random variables X\ , . . . , X n with values in K. Their 
common unknown density s belongs to the set S of all density functions with respect to the Lebesgue 
measure on R. The considered unidimensional Gaussian mixtures are characterized by their number 
of components m and their means and variances parameters are assumed to be bounded. These 
mixture densities are grouped into a model collection (<S m )mejvi„, subsets of 6>, defined by 

{m m \ 

x e K p u ip au (x - Mu);Mu e [-P>>P]i a t e [A, A];p M g [o, i],y^p u = 1 > (1) 
11=1 u=l ) 

where i/j is the Gaussian kernel defined by ip(x) — tt~^ cxp(— x 2 ) for all x £ R and ip a (') = o --1 ^ (-) 
for all a > 0. The number of free parameters, common to all the mixture densities of a given model 
S m is called dimension and is denoted D(m). Considering a non asymptotic point of view (see for 
instance Massart, 2007), the three bounds ft, A and A of each model S m and also the maximum 
number of mixture components in the collection may depend on n. Such mixtures are called sieves 
according to the terminology introduced by Grcnandcr (1981). 

Over each model <S m , a maximum likelihood estimator (MLE) s rn is obtained by minimizing the 
empirical contrast 

1 ™ 

7«(«) =— 5>{t(*o>. 

i=i 

The loss function associated to the likelihood contrast is the Kullback-Leibler divergence: For two 
densities s and t in S, the Kullback-Leibler divergence is defined by 

KL(s.t) = j ln \j0^< x ) dx 

if sdx is absolutely continuous with respect to tdx and +oo otherwise. The model m* in the collection 
minimizing the Kullback-Leibler risk 

m* <E argmin E s [KL(s, s m )\ 

is considered as the "best" model of the collection. Nevertheless this best model m* and also the 
associated density s m * (called oracle) are unknown since they depend on the true density s. A 
model rh is then chosen by minimizing over M. n the following penalized criterion 

crit(m) = 7„(s Tn ) -I- pen(m). 

The penalty function pen : m £ M. n H> pen(m) G M + has to be chosen such that the Kullback- 
Leibler risk E s [KL(s, s m )] of s m is close to the oracle risk E s [KL(s, s m *)]. The construction of such 
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penalties is proposed in Theorem 2.2 in Maugis and Michel (2009). This theorem can be stated as 
follows in the univariate context, where dn(g, h) = \\yfg — V^l^ denotes the Hellinger distance 
between two densities g and h of S: 

Theorem 1. There exists two absolute constants n and C such that, if 

, n Dim) i , / 1 M 

pcn(m) > k — - — - <^ 1 + 2 A 2 + In 



whe 



then the model m minimizing 



over M. n exists and 



A = v/ln(67re 2 ) + \/tt + Win ( p,^ — 



crit(m) = 7„(s TO ) + pen(m) 



(2) 



E[d£(a,s*)] <C 



inf {KL(s, S m ) + pen(m)} H — 



(3) 



Note that a similar result can be found in Maugis and Michel (2009) for multivariate data clus- 
tering with variable selection. The method has been successfully implemented and tested in practice 
(see Maugis and Michel, 2010). 

Minimax adaptive estimation has been intensively studied in nonparametric statistics, see for 
instance Tsybakov (2009), and Massart (2007) for adaptive minimax methods based on penal- 
ization. A natural optimality criterion is the minimax risk, first introduced by Wolfowitz (1950). 
Let 

TZ(s n ,'Hfj) — sup E s [d 2 H (s, s n )] 

be the maximal Hellinger risk of an estimator s n of s. The minimax Hellinger risk on a density class 
H/3 is then defined by 

Sn 

where the infimum is taken over all the possible estimators s n of s. An estimator is said to be 
minimax on Tip if its maximal risk over Hp reaches the minimax risk on this density class. Let us 
now consider a collection (Hp)/3^B of density classes indexed by a set B of regularity parameters (3. 
An estimator is said to be minimax adaptive if it reaches the minimax risk over Hp for all /3 of B. 
without using the knowledge of /?. In order to motivate the clustering method based on Gaussian 
mixture estimator §m proposed in Maugis and Michel (2009), we prove in this new paper that this 
estimator is minimax adaptive over a particular collection of Holder density classes (Hp)p£B defined 
further. Of course, adaptive density estimation in one dimension is now a classical problem and 
several adaptive estimators have been already proposed such as kernel estimators or thresholding 
wavelet estimators. Nevertheless, although these alternative methods maybe perform better than 
our penalized estimator s m concerning density estimation in general, these have no interest for 
clustering purposes. 

The link between model selection and adaptive estimation is made through approximation theory. 
Indeed, an adaptive estimation is possible only for functional classes Tip that can be efficiently ap- 
proximated by our Gaussian mixture collection. Convolution is widely used in approximation theory 
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and many results are known on this topic. It is well known that the convolution of a density / with 
scaled versions ip a of the Gaussian kernel ip converges to / (see for instance Cheney and Light, 
2009, chapter 20). The so-called quasi- interpolation method consists of replacing the functions 
ipa * / by infinite linear combinations of scaled and translated Gaussian kernels (see for instance 
Cheney and Light, 2009, chapter 36). In a recent paper of Hangclbrock and Ron (2010), a nonlinear 
approximation algorithm based on finite combinations of scaled and translated Gaussian kernels is 
defined to give some approximation results in L p norm on some particular density classes. Neverthe- 
less, all these results cannot be straightly applied to study the approximation capacities of Gaussian 
mixtures. Indeed, the coefficients in these linear combinations are not necessary positive and their 
sum is not constrained to be equal to one. Furthermore, the approximation results provided by 
all these methods are not given for the Kullback-Leibler divergence as required by our statistical 
context. 

The approximation capacity of Gaussian mixtures has also been studied in non parametric 
Bayesian works. Lemma 3.1 in Ghosal and van der Vaart (2001) gives a discretization result for 
Gaussian mixtures: assume that s is a location or location-scale mixture with a mixing distribu- 
tion compactly supported or with sub-Gaussian tails, s can be approximated by a finite Gaussian 
mixture with a small number of components, the error being controlled in L± and norms. In 
Ghosal and van der Vaart (2007), these authors take advantage of this method for approximating 
by finite Gaussian mixtures some twice continuously differentiable functions with additional regular- 
ity conditions. More recently, Kruijer et al. (2010) prove an approximation result by finite Gaussian 
mixtures for densities whose logarithm is locally Holder. Their approximation result is given for the 
Kullback-Leibler divergence. This last result can be successfully adapted in our context to control 
the bias term in the right side term of the oracle inequality (3) on these particular density classes. 
Concerning approximation, the contribution of our work consists of checking that the non explicit 
constants of the approximation bounds given in Kruijer et al. (2010) are actually uniform over a 
density class Hp we define. For easier reading, all the approximation results are given and proved 
in this preprint version although a large part of them can be found in Kruijer et al. (2010). 

The paper is organized as follows: The main results are presented in Section 2. The density classes 
Hp are introduced in Section 2.1 and an approximation result, adapted of Kruijer et al. (2010), is 
given in Section 2.2. Next, a lower bound of the minimax risk is given in Section 2.3 and the adaptive 
property of our penalized Gaussian mixture estimator on these density classes Hp is addressed in 
Section 2.4. The approximation result, the lower bound and the adaptive result are respectively 
proved in Sections 3, 4 and 5. Finally, some technical results are developed in Appendices A and 
B. 

2. Main results 

2.1. The density classes ~H(f3,V) 

The adaptation result given further requires a slightly modified version of the approximation result 
by finite Gaussian mixtures proved in Kruijer et al. (2010). This approximation result concerns 
densities whose logarithm is locally /3-H61der and that fulfills additional tail, moments and mono- 
tonicity conditions. More precisely, let [3 > 0, r = [f3\ be the largest integer less than (3 and k G N 
such that /3 € (2k, 2k + 2}. Let also V be the set of parameters {7, l + , L, e, C, a, £, M} where L is 
a polynomial function on R and the other parameters are positive constants. We then define the 
density class H(/3,P) of all densities / satisfying the following conditions: 
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1. Smoothness. In/ is assumed to be locally /3-H61der: for all x and y such that \y — x\ < 7, 

Qnf)V(x)-(lnf)( r \y)\<r\L(x)\y-x\P- r . (4) 

Furthermore for all j <G {0, . . . , r}, 

|(ln/)0')( )|</+. (5) 

2. Moments. The derivative functions (In/)") for j = 1, . . . ,r and the polynomial function L 
fulfill 

2<3 + e „ 

(ln/)«(z) 3 f(x)dx<C, / \L(x)\ 2+ f f(x)dx<C. (6) 

3. Tail. For all x G R, 

/(x) < M^(i). (7) 

4. Monotonicity. / is strictly positive, / is nondecreasing on (—00,— a) and nonincreasing on 
(a, 00), and f(x) > £ for all x € [—a, a]. 

Remarks 1. The monotonicity assumption can be relaxed by assuming that there exist two constants 
c > and a > such that V0 < a < ct, V.t £ M, 

This condition corresponds to the first point given in Lemma 13 in Appendix A which is a key point 
to prove the approximation result. In the following, the strong monotonicity condition is assumed 
in the definition of the density class H(/3,V) to simplify the proofs of the lower bound. 

Remarks 2. For easier reading, the monotonicity assumption is stated on a symmetric interval 
but it is possible to consider this assumption on a general interval [011,012] with ai < o^- This 
monotonicity assumption allows us to lower bound the convolution f *ip a by f up to a multiplicative 
constant according to Remark 3 in Ghosal et al. (1999). 

Remarks 3. These density classes are more restrictive than those considered in Kruijer et al. 
(2010): Indeed the upper bounds in (6) have to be uniform on the density class l-l((3,V) and we 
also need the additional Condition (5). These restrictions allow us to control the Kullback-Leibler 
divergence between a density of H(/3,V) and a convenient finite Gaussian mixture, uniformly over 
H(/3,'P). Note that Condition (7) is here assumed on R but it could be assumed only outside an 
interval as in Kruijer et al. (2010). 

Remarks 4. In the sequel, V is said to be "larger than" V if at least one of the following conditions 
is fulfilled: 

• at least one constant among M, C or l + of V' is larger than the corresponding one ofV, 

• the constant 7 ofV' is smaller than the corresponding one ofV, 

• for all x G R, L(x) < L'(x) where L (resp. L' ) belongs to V (resp. V ) 
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2.2. Approximation result 

For any function /, K a f denotes the convolution / * and A CT / is the error term K a f — f. 
As explained in Kruijcr et al. (2010), for a /3-smooth density / with /3 < 2 and under reasonable 
regularity assumptions, it is possible to define a finite location-scale Gaussian mixture p CT such that 
KL(/, p a ) = 0{<j 213 ). The usual approach consists of discretizing the continuous mixture K a f. But 
as I)/ — i^cr/Hoo remains of order er 2 when /3 > 2, this approach appears to be inefficient for smoother 
densities. An alternative strategy is proposed in Kruijer et al. Kruijer et al. (2010), based on the 
following successive convolutions of /: f — f and for all j > 0, = / — A a fj. In their paper, 
the density is approximated by a discrctized version of the continuous mixture K a fk where k G N 
is such that (5 G (2fc, 2/c + 2]. 

In our framework, Lemma 4 in Kruijer et al. (2010) cannot be directly used since the upper bound 
over the Kullback-Leibler divergence between / and the finite Gaussian mixture is not uniform 
over / H(/3,V). Thus some additional work is necessary in order to prove an uniform version of this 
approximation result. Another reason for revisiting the approximation results given in Kruijer et al. 
(2010) is that these ones are stated for a < a where a depends on the approximated density /. 
Thus we also need to check that it is possible to choose the same a for all the densities of T-L (j3,V). 
The proof of Theorem 2 consists of carefully following the method of Kruijer et al. (2010) in order 
to obtain this uniform version. A sketch of the proof is given below and a self-contained proof is 
detailed in Section 3. 

Theorem 2. There exists a positive constant cr(/3) < 1 such that for all f G H(f3 7 V) and for all 
a < <j(j3), there exists a finite Gaussian mixture of density p a with less than Gpa~ l \ In <r| = support 
points, with the same variance a for each component and with means belonging to [— a a ,a^\ where 



where cp is uniform on H, ((3, V) and continuous on f3 . The constant cr(/3) only depends on T-L(fi,V) 
and is a continuous function of (3. Moreover, Gp and Gp are two positive constants that only depend 
on %{I3,T'), and are both increasing functions of (3. 

The two constants Gp and Gp are explicitly defined by Equations (55) and (56) in the proof of 
Theorem 2 in Section 3.2. 

Sketch of the proof. Let / be a density in a given class W((3,'P). First, the convolution K a fk is shown 
to be close to / on a subspace of K where the derivative functions of In / and L are efficiently con- 
trolled (see Lemma 1). On this subspace, the difference K a fk — /is controlled by f(x)Rf(x)0(a"), 
apart from a term a H where H can be arbitrarily large. The term 0(a^) is uniform on H((3, V) and 
Rf is a polynomial function of L and the derivative functions of In /. Next, since /& is not necessarily 
a positive function, a density function is defined from /&. The previous result is then adapted 
for controlling K a hk — f on a more restrictive subspace of R (sec Lemma 2). Based on this result, a 
control of the Kullback-Leibler divergence between / and the continuous Gaussian mixture K a hk 
is obtained in Proposition 2: KL(/, K a hk) < cpa 1 ^ where cp is a multiplicative constant uniform 
on H{p,'P). Finally, using a discretization result, a similar control is obtained for KL(/, p a ) where 
p CT is a finite Gaussian mixture fulfilling conditions given in Theorem 2. □ 



/V < Gp\\na\* 



such that 




(8) 
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2.3. Lower bound 

In order to show that the MLE penalized estimator s m is adaptive to the smoothness parameter /3, 
a lower bound of the minimax risk lZ n (H((3,'P)) is required. For all < (3 < (3, a "large enough" 
parameter set P{(3, (3) is found such that for all f3 £ [(3, (3] , ri((3, V((3, (3)) is well defined and a lower 
bound is given for the density classes H ((3, V{f3, /3)) . Note that in Theorem 2, the constants cp, 
a(/3), G/3 and Gp cannot be bounded uniformly for all f3 £ R + . Nevertheless, it can be proved that 
s m is minimax adaptive on a range of regularity [/3,/3]. 

First, the parameter set V(/3,/3) has to be defined rigorously. Its definition is rather technical 
since it depends on the way the lower bound is proved. The proof is based on the construction of 
some oscillating functions, this standard method is presented for instance in Massart (2007, see 
Section 7.5). Let us take some infinitely diffcrcntiable function ip : R — > R with compact support 
included into ( j, |) such that 

/^ = 0and U{ X f dx = l. 
Jr Jr 

We set A = max H^^'Hoo > 1 and let D be some positive even integer. For any positive integer 

0<k<r+l 

j 6 {1, . . . , D}, we consider the function 

ifj : R R 

<= » ^^(f(^+f)-(i-l))- 

Moreover, let T(a, £) be the space of functions w : R — > R + such that w is nondecreasing on 
(— oo, — f ), nonincreasing on (^, +oo), uj(x) = 2£ for all x £ [ — x> x] > anc ^ = u{a) = £. 

Next, let V = |f , ln(2£), L, e, C, a, £, Afj be a parameter set such that T(a,£)f)H(/3,'P) is 

nonempty. Based on a function tu £ T(a,£) p|H(/3,'P) and the functions ifj, we consider the func- 
tional space J {(3, D) = {fg; 9 £ {0, 1} D \ where for all 9 £ {0, 1} D and for all 

D 

fe(.x)=cj(x)+Y l (2$ j -l)<p j (x). (9) 
j'=i 

Proposition 1. There exists a parameter set V(f3, f3) such that for all D £ N* and for all /3 £ \J3,(3\, 

J(fi,D)cH{p,P(j3j)). 

Remarks 5. A^oie £/ia£ if such a parameter set exists, Proposition 1 is also true for all the parameter 
sets larger than it (in the sense given in Remark 4)- A key point to prove the lower bound stated in 
the next theorem is that the parameter set V((3,(3) does not depend on D. 

Theorem 3. Suppose that one observes independent random variables X\, . . . ,X n with common 
density s with respect the Lebesgue measure on R. For any j3 £ [/?, /?] and any parameter set V(/3, f3) 
given by Proposition 1, there exists a positive constant up such that 

1l n {U{p,V{P,P))) :=inf sup E[d 2 H (s,s)} > 

~ s s 

where the supremum (resp. the infimum) is taken over all densities s in ri((3, V((3, j3)) (resp. over 
all possible estimators s of s). 
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Proposition 1 and Theorem 3 are proved in Section 4.1 and Section 4.2 respectively. After es- 
tablishing Proposition 1, the Hellinger distance and the Kullback-Leibler divergence between two 
functions of J{fi, D) are controlled in Lemma 5 and Lemma 6 respectively. These controls arc 
required to combine a corollary of a Birge's Lemma (see Birge, 2005) and the so-called Varshamov- 
Gilbcrt's Lemma. These last two results can be found in Massart (2007, see Corollary 2.19 and 
Lemma 4.7) and are reminded in Appendix B. 



2-4- Adaptive density estimation 



In a non asymptotic model selection approach, the model collection may increase with the sample 
size n, leading to an adaptive procedure. As it was already explained, the adaptive properties of 
s m are studied on a range of regularity [/3,/3]. Preliminary, we fix < (3 < j3 and we also choose 
cip > 1 large enough such that 



a 







In 2 



< 1, (io) 



where Gp is defined in Theorem 2. The parameters of the Gaussian mixture models (<S m ) TOg jK n are 
now specified in order to apply the approximation results provided by Theorem 2: 

{m m \ 

x e K H+y] p u ipa u (x - p u );^u G [-P{rn),p(m)],al e [A(m), X(m)],p u e [0, 1],^ p u = 1 ? 
11=1 11=1 J 

where y/X(m) := a ( gm~ 1 (lnm) 3 / 2 , p{m) = Gp\ In ^/A(m)! 1 / 2 and A(m) > A(m) for all m. Note that 
the last parameter A(m) can be taken the same for all m and is denoted A in the sequel. Since a 
n-sample is observed, it is natural to suppose that the number of mixture components m is less than 
n and we also assume that the mixtures have at least two components: M n = {2, . . . , n}. Note that 
when the sample size n increases, mixtures with small component variances and many components 
m are available in the model collection. This obviously improves the approximation capacity of the 
Gaussian mixtures. 

Theorem 4. Assume that n > 3 and let s r h be the penalized maximum likelihood estimator 
minimizing the penalized criterion defined in Theorem 1. Then there exists a constant Cp p such 

that for all (3 e {§, j3] and for all s£H{(3, V(P, (3)), 

E[d|-(s,s A )] <cp_-p (ln7i)W n w. 

Theorem 4 shows that the penalized estimator s m is adaptive on the regularity j3 of the density 
classes defined in Section 2.1, up to a power of ln(n). This logarithm term is due to the penalty 
shape given in Theorem 1. It is not detected in practice as shown in Maugis and Michel (2010) 
and we suspect that it could be removed from the penalty shape. Note that the non parametric 
Bayesian estimator defined in Kruijcr et al. (2010) has a similar rate of convergence with a greater 
power of the logarithm term. 



3. Proof of the approximation result 

In this section, the density functional space H(f3,V) is fixed. To make the proofs and the results 
easier to read, we use the notation cp (resp. ct(/3)) for denoting constants (resp. upper bound on a) 
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that only depends on j3 and V. We also use the notation cp_ p (resp. a(/3,p)) if it also depends on 
an other parameter p. Moreover, we introduce the following notation: For any nonnegative integer 
h, the /i-fold convolution of the Gaussian kernel ip is denoted ip* h and for any nonnegative t, the 
t-th moment of tp* h is defined by Vh,t = J x t ip* h (x)dx. We also denote as the jth derivative 
■j^j In f{x) of In / and we consider a subset A a defined by 

A a := {a; G R; \lj{x)\ < <Bo-- J | lncr|^'/ 2 , j = l...r, L(x) < < Ba~' 9 | \na\-^ 2 } 

if j3 > 1 and A a := {x e R; L(x) < < Bcr-' 3 | lncr|- /3 / 2 } otherwise. 

3.1. Approximation by a continuous mixture 

Lemma 1. Let /3 > and fceN smc/i £/ia£ /3 e (2fc, 2fc + 2]. For aZZ iJ > 0, there exists a([3, H) > 
smc/i £/ia£ for a/Z <r < cr(/3, H), for all f € H P) and /or aZZ x G A a we have 

(K a f k )(x) = /(*) [1 + iJ/^O^H^)] + PM (a H ) 

with Rf(x) = a r +\L{x) if f3 < 1, and 



Rf(x) — a r+ iL(x) + aj |Zj( 



sell J 



otherwise. In both cases, the aj 's are nonnegative constants that are uniform on % (fi,V). Further- 
more, a(/3, H) is a continuous function of f3 and H . 

Proof. Let H > and f £ H(/3, V). If (3 > 1, for all x and y such that \y — x\ < 7, there exists p 
such that \x — p| < |x — y\ and 



ln/(y)=ln/(x)+^^(j/-x) 
3=1 h 



l r (p) - l r (x) 



Then, the smoothness condition (4) implies, since \p — x\ < \y — x\ < 7, that 

< L(x)\p-xf- r \y-x\< 



]nf(y)-lnf(x)-^2 l -^-(y-xy 
3=1 J ' 



Thus we have 
and 



< L(x)\y-xf. 
lnf(y)<lnf(x) + B(x,y) 
ln/(y) >lnf(x) + B(x,y) 



(11) 



(12) 



with B{x,y) = J2 -^{v-xY +L{x)\y-x\ f} and B(x,y) = £ -^{y ~ x) j - L(x)\y - x\ p . Note 

3=1 ' 3=1 
that for (} < 1, (11) and (12) are valid with B(a:,y) = -B(x,y) = L{x)\x - y\P, 
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Let x £ A a and y £ D x := {y £ K; \y — x\ < k'a\ lno - ! 1 / 2 } where k 1 , chosen below, has to be 
identical for all / £ H (/3, V) and we also assume that a is small enough to satisfy 

k'a\lna\^ 2 < 7. 

Then (11) gives that for all y £ D x , f(y) < f(x) exp[i3(x, y)] and thus 



Kaf{x) < f(x) / e B ^y^ a (v - x)dy + / /(y)Vv(y - *)dy 



(13) 
(14) 



For the sequel, note that, for x £ A a and y £ D x , if (3 > 1, 

T k'i 

\B(x,y)\ < — aj \ 1hct| j72 95(7- j '| hxa\~ j / 2 + k' p a p \ In af /2f Ba~ 13 \ lnaf 13 / 2 

< Q3^^+fc ,/3 :=di(/3,fc') 
i=i J ' 



and thus 



^ E^'(^J/) + |-B r+1 (^2/)| 
< ^]i^( a; ,y) + |^ +1 (x,2/)| 



E 

j>0 



1 



(j + r + 1)1 



B j {x,y) 



j>0 J ' 

< Y^\-^{x,y) + d 2 {p,k')\B r+1 {x,y)\ 



3=0 



(15) 



with d 2 (f3,k') = exp[di(/3,fc')]- 



Case k=0 : We consider that /? £ (1, 2] thus r = 1. The case (3 £ (0, 1] is discussed hereafter. We 
have B(x, y) = l\(x)(y — x) + L(x)\y — xf and (15) yields 

e B(x,y) < \+B{x,y) + d 2 (fi,k')B 2 {x,y) 

< 1 + h{x)(y - x) + L{x)\y - xf 

+d 2 (/3, k') [h(x) 2 (y - xf + 2L(x)h(x)\y - xf(y - x) + L 2 (x)\y - x\ 2 ?] 

< 1 + h{x)(y - x) + L{x)\y - xf 

+d 2 (/3, k') [^&k'f- \h{x)f\y - xf + 2<Bk , Pl 1 (x)(y - x) + m ,p L(x)\y - xf] 

since \li(x)(y— x)\ < Q3/c' and \L(x)(y— x)@\ < QS/c'' 3 . Since tp a is symmetric, J D (y — x)ip„(y — x)dy = 
and thus, 



D 



e B{ - x ^My~x)dy < 1 + L{x) j> a (y - x)\y - xfdy 

Jd x 

+d 2 (f3 1 k')[(<Bk') 2 -P\h(x)f + <Sk ,(3 L(x)} I My-x)\y-xfd y 



< 1 + {d 2 {P, k'){m'f- p \h{x)f + [1 + d 2 (P, k')<Bk ,f! }L(x)} 

< l + d 3 (p,k') [2L(x) + \h(x)f] a? . (16) 
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Let k' = k'(2, 0, 1, H) given by Lemma 8, and tr(/3, H) such that (13) is satisfied. Note that ct(/3, H) 
can be taken as a continuous function of /? and H. Then, for the second integral in the right hand 
of (14), using (7), it gives 



f{y)^(y - x)dy = I /(y)Vv(y-tf)i |a _ x| > feV|lnCT| ^ (y)dy 



< M l ip(u)du 

J\u\>k'\lna\i 

< c p , H a H . (17) 

Furthermore, since k' depends on H, dz(l3,k') can be also rewritten as a constant c^jy. Finally, 
(14), (16) and (17) give that 

with R f (x) = 2L(x) + \h(x)\P. 

For P e (0,1], (14) is still valid with B{x,y) — L(x)\y — x\@ . For x <E A a , the first integral in 
(14) can be treated as in the case (5 € (1,2]: it yields 



exp(L(cc)|x - yf)ip a (x - y)dy < 1 + c (3 L(a;)^i^cr /3 . 
Using Lemma (8) as before, it gives 

f(y)i>c{x ~y)< C/3MCT H 



and finally, for all x G A a , 

K a f{x) < f(x) [1 + cp,HHx)a f> ] + cpjiaf. 
A similar lower bound can be shown in the same way, it is proved in the general case further. 

Case k=l: We consider (3 <E (3,4], a similar proof gives the result for (3 <G (2,3]. According to 
(15), for x £ A a and y £ D x , 

e B( x , v ) < ! + B{x ^ y) + 1 y) 2 + l B(Xj y) 3 + ^ k , )B ^ y) 4 (18) 

with B(x,y) = h(x)(y - x) + \l 2 {x){y - x) 2 + ±l 3 (x)(y - x) 3 + L(x)\y - x\ fi . Thus cxp[B(x 7 y)} is 
upper bounded by a linear combination of terms of the form 



[L(x)\x-yfY U ~[[Ux)(y-xy 



j'=i 



with Y^j=i Vj < 4. Let A\ (x, y) be the sum of such terms for which 774/? + J^j=i i'T? < an d ^ 
A 2 (x, y) be the others terms and thus e B ( x ' v ' < A±(x, y)+A 2 (x, y). Note that the constant d 2 (k', /?) 
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only appears in the terms of Ai{x, y). By removing inside Ai(x, y) all the terms for which the power 
of (y — x) is an odd integer (since J u tr 4> a {u)du = if t is an odd integer), it yields 

A 1 (x,y)^ a (y ~ x)dy = J tp a (y - x) jl + ^ [l\{x) + l 2 (xj] (y - a;) 2 | dy 

< l+ v JA[il [x) + l2{x) )^, (19) 

Next, for each term of A 2 (x, y), we have for all x <E A a and all y € D x 

3 3 ] i40+£ij=i ivj 



[L(x)\x-yf] Vl Y[\l 3 (x)(y-xy\ n > = l[L{x)\x - yf] Vi JJ I^X* " v) 
i=i I j=i 



a 



L(x)* H\lj{x)r } \x-y\t- 



3 = 1 



and finally 



3 3 1 'J4+E 3 3 =1 fa 

[L(x)\x-y\f>] 7 *l[\l j (x){v-x)i\' li < cp I L{x) Vl J| \lj{x)\ Vi f \x-y\t 

3 = 1 [ 3 = 1 

according to Lemma 11. Note that 

3 3 



m + E^ijvi [ ?Ji J m + E J= ijv3 { ~{ p 



since the logarithm function is concave. According to Lemma 11, for each term of A 2 (x, y) we thus 
have 

[L(x)\x - yf] m f[ \l {x){y - x)^ h < -g \x - yf \ m L(x) + £ vMl^x)^ 

V± + E 3= ijV3 { J=1 P 

and then 

\A 2 (x,y)\ < c 0>k , j„ ;/.;,; ■ /,(,•; •' | \x-yf 

where c$ t y comes from d 2 {f3, k') in (18) and where the a/s are positive constants that only depend 
on n(j3,V). It leads to 

\A 2 (x,y)\tp (r (y-x)dy < c^ k ,R f {x)<j fi (20) 



imsart-generic ver. 2011/01/24 file: PreprintMauglsMichel.tex date: January 15, 2013 



Maugis and Michel/ Adaptive density estimation using finite Gaussian mixtures 13 

where R f (x) = a 4 L(x) + Y? j=1 «j \h( x )\ P/j • Finally, (19) and (20) together yield 

e B ^Ua(y - x)dy < 1 + y {l\{x) + h(x)} a 2 + c p , k ,R f (x)^. 

Let k' = k'(2, 0, 1, H) given by Lemma 8, and a(/3, H) such that (13) is satisfied and where <r(/3, H) 
can be taken as a continuous function of j3 and if. Next, for all / € 'H (/3,V), and all cr < a({3,H), 



D'l 



f{y)i>c{y - x)dy < C/3mv H - 



Finally, for a < a((3, H), 

(K a f)(x) < f(x) [l + ^ {l\{x) + h(x)} o 2 + c 0>H Rf(x)<T P ] + c PM a' 
and the similar lower bound is obtained in the same way, see the general case further. Thus, 



(K a f)(x) = f(x) [l + ^ {ll(x) + h{x)} a 2 + R f (x)0^ H (a p )] + Op, H (a H ). (21) 

Now, we need a similar result for Si instead of /. Equation (21) depends on the kernel t/j through 
the values of ^x,2- In fact, it holds for any symmetric kernel <j> such that J (j)(x)x t dx = v\_t < °o 
and J| x | >fe ,i i ncr |i/2 ^(x)|x|*dx = 0[3^h{& H ) when k' is large enough. For tp* 2 , these properties follow 
from Lemma 8 : let k! and a(/3,H) such that (13) is satisfied and where a(/3,H) is a continuous 
function of /3 and H. Thus, denoting ^2,u = / x u ip* 2 (x)dx, for all a < <r(/3, ii), 



= /(x) [l + ^ {Z?(z) + / 2 (x)} a 2 + i^O^fV 3 )] + O ftff (a H ). 

Now, since Si = 2/ — ^o-/ and ^2,2 = 2^1.2, it yields for all a < a(j3,H) that 
(K a Si)(x) = fix) [1 + R f (x)Op,„(<TP)] + O pM {a H ). 

General case: Let (3 g (2k, 2k + 2]. We give the main ideas of the proof in the general case. 
According to (15), for x G A a and y G D x , exp[B(x,y)] is upper bounded by a linear combination 
of terms of the form 



[L{x)\x-y\'T +1 \{Ux)(y-xy 



with ^ Tyj < r + 1. We then decompose e B ( x,y ^ into »4i(x, y) and .42 (x, y) as before. By removing 
inside .4i(x, y) all the terms for which the power of (y — x) is an odd integer, it yields 



/ Ai(x, y)?/v(y - x)dy < 1 + V] ^i.2u<9«(x)cr 2u 



(22) 



where the Q u 's are positive functions that can be expressed in function of L and the i u 's. Following 
the same method as for € (3,4], it yields 



|-4 2 (x,y)| < <j a r+ iL(x) + ^a i |Z J -(x)|^' }• |x - y|^ 

j'=i 



(23) 
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and 



\Ai(x,y)\ ip a {y -x)dy < cp ik >R f (x)a^ 



(24) 



where Rf{x) = a r+ iL(x)+ a j \h( x )\ j ■ Finally, (22) and (24) together yield 

3=1 

/ e B <-*> v ty„(y - x)dy < 1 + V Vl , 2u Q u (x)a 2u + c^R f (x)^. 

Using Lemma 8, let k! depending on H and a(/3,H) > such that (13) is satisfied and where 
cr(/3, H) is a continuous function of /3 and H. For all a < tr(/3, H), 



D'i 



f(y)^<r(y - x)dy < c fj . H (j H . 



Finally, since kl depends on H, 



{K a f){x) < f(x) 



1 + ^ Vl,2uQu{x)(J 2U + C 0tH R}{x)(T^ 



and the similar lower bound is obtained in the same way (see further). Thus, 



(K a f)(x) = /(*) 



1 + J2 "h2uQu(x)<J 2u + R f (x)Op, H {<T P ) 



+ 0^ H {a H ). 



Now, we need a similar result for instead of /. According to Lemma 12, 



(25) 



i=0 



For all h < k, the same method can be applied with ip* h instead of ip and it yields with the same 
functions Q u and Rf. for a < er(/3, H), 

k 

1 + v K2uQu{x)o 2u + R f {x)O p . H (<Jp) 



(K2f)(x) = f{x) 



u=l 



According to (25), for a < a{f},H) 



KMx) = ^(-lYil+DK^fix 

i=0 



3=1 

fc+1 

= E(- 1 ) J+1 (T)/(^ 

3 = 1 



1 + v ],2uQu{x)a 2u + R f (x)Op M (<T f: 



k+l 



fe+i 

3=1 



Q u (x)a 2u + R f (x)Op iH {aP) } + O fj M(a H ) 
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and then 



Kafk(x) = f(x) { 1 + R f (x)O P M(a )} + O p , H {a H ) 



since ELoC- 1 )* (i+i) = 1 and ES(-l) i+1 C} 1 ) f,\3ti = according to Lemma 9. 

To complete this proof, we give the method for obtaining the lower bound in the general case. 
Using (12) and proceeding in the same way as for the upper bound, it yields 



K„f{x) > /(*) / e B ^^(y - x)dy 



> f(x) 



D , 



l + J2\&(x,y)-d2(P,k') B r+1 (x,y) 



3=1 



ipv(y - x)dy 



> f(x) Ai{x,y)ip a {y - x)dy + f(x) I A 2 {x,y)ip rT {y - x)dy (26) 

JD X J L> x 

where A\(x,y) (resp. A.2(x,y)) contains the terms which powers are less than j3 (resp. larger than 
0). For the first integral, 



Ai(x,y)il)„(y ~ x)dy 



Ai{x,y)ij) a {y - x)dy - I Ai{x,y)ij) a (y - x)dy 



> 1 + ^ vi,2uQu(x)cr 2 



Ai{x,y)^ a (y - x)dy 



Note that A\(x,y) is a linear combination of terms of the form YYj=i [^(i)^-^) 3 ']^, where 
Y^j=i JVj i s even. Since x £ A a , and since \x — y\ < 1, we can find h > and a constant cp that 
only depends on j3 such that 

\Ai{x,y)\ < c p a- h {x-y) 2 . 



Finally, 



Ai{x,y)il) a {y - x)dy 



D>; 



<cpa / (x-y) ip (T (y-x)dy. 



Then we apply Lemma 8 with H' > h + H and it gives that 



Ai(x,y)il) a (y - x)dy 



To find a lower bound for the second integral in (26), we note that according to (23), 
\A 2 {x,y)\ > -cp, H la r+1 L(x) +J2a j \l j (x)\^ \ \x-yf 



and thus 



D x 



\^2{x,y)\tp a {y - x)dy < cp, H Rf{x)a p 
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where Rf is defined as before. We finally obtain that for a < a((3, H), 



K a f{x) > f(x) 



1 + "l,2uQu(x)<J 2U + Cp,H Rf{xW Z 



u=l 



Cp,HCr 



□ 



For a density / belonging to H(/3,V), Lemma 1 shows that the convolution K a f k is close to / on 
a subspace of K where the derivative functions of In / and L arc efficiently controlled. Furthermore, 
the control on the difference K a fk — f is uniform over H(/3,V), which is required to upper bound 
the Kullback-Leibler divergence between / and K a f k . Thus K a f k seems to be a good candidate 
to approximate the density function /. Nevertheless, the function f k is not a density function: 
Its integral over K is equal to I (see Lemma 12) but it can take negative values. To remedy this 
problem, Kruijer et al. (2010) define a density function h k as follows: Considering the subspace 



Jcr.k = S X G 



the following positive function is defined 



h(x) > I f{x) 



Vx G R, g k (x) = fk(x)lj^ k (x) + -f(x)lj lk (x) 



and it is normalized to obtain a density function 

Vx G R, h k (x) 



9k(x) 
J g k {u)du' 



(27) 



Note that the constant 1/2 is arbitrary in the definition of J a ,k, any other number of (0, 1) could 
be used. 

Now, the result of Lemma 1 has to be extended for the convolution K a h k . For this purpose, the 
integral of K*f for all nonnegative integers t < k is controlled over A c a and E c a where A a is defined 
by (3) and E a R; f(x) > a Hl } with H 1 > 4/3. 



Lemma 2. Let (3 > and k G N such that (3 G (2k, 2k + 2]. There exists a(f3,Hi) > such that 
for all a < cr(/3, H\), for all f G V) and for all nonnegative integers t < k, 

f (KU)(x)dx = O fi (o- 213 ) (28) 

and 

f (K t j)(x)dx = O p , Hl (a^). (29) 
Furthermore, for a < a (ft, Hi), A a n C Jo-./t anrf 

( g k (x) dx = l + O p , Hl (a 2fi ). (30) 
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Thus, for all H > 0, there exists <j(ji, Hi, H) > such that for all a < (x(/3, Hi, H) and for all 

x e A a n E a , 

\{K a h k ){x) - f(x)\ = f(x)R f (x)O^ HuH (a ) + O fj 

Furthermore, a(j3,Hi) and a{fi, Hi, H) are both continuous functions of (3, Hi and H for the last 
one. 

Remarks 6. The left term in (30) does not depend on Hi whereas the right term does. Indeed, the 
presence of Hi here is only technical and by choosing for instance Hi = 4/3 + 1 , it gives that there 
exists a positive constant ct(/3), continuous in f3 such that for all a < 

[ g k {x)dx = l + Op{a 20 ). (32) 

JR 

Proof. For <5 < 1 to be chosen further, let 

A a ,s ■= {x G R, \ lj{x)\ < 6*Bo-- j \]na\- j/2 ,\/j 6l...r, L(x) < 6fBa~ p \ \xia\~ 0/2 } 
if ft > 1 and let 

A a , s := {x G R, L(x) < 5^a~ p \ \na\- f3/2 } 

otherwise. Note that for all 5 < 1, A a ,& C A„. In the sequel we assume that (3 > 1, the proof being 

easily adapted for j3 < 1. 

Proof of (28): 

• Case t = 0: If X ~ /, then 

/ (K°j)(x)dx = f f(x)dx 

r 

< J2 P ( \ l o( X )\ > (*®)<r~ J '|ln* \-i< 2 )+P(\L(X)\ > {5<&)o~ p \\no-\-P /2 ) 

3 = 1 

28 + e 2B+e n a 28 + e 

+P(\L(X)\— > (505) — a- 2 ^ £ \\na\-^-) 

< j^P{UX)\^ >{5*)^o-- 2 e)+P(\L(X)\ M ^ >(5<B) 2 -^a~ 2 ?) 
i=i 

since a~ e \\na\~~^~ > 1 for a small enough (say a < cr(/3)). Then, Markov Inequality together 
with (6) gives 

28 + e 28 + e n a 28 + e na 28 + e ~ a 

P( \lj(X)\^ > {8%)^cj- 2 ^) < («J«8) — —<T 2p E[\l 3 {X)\^] < cpa 2 ? 

and 

28+e 28+e a a 

P{\L{X)\— > (5<B) — a" 2 ' 3 ) <cpa 2 P. 

Finally, 

/ (K°J) (x)dx < cpa 2 ?. 

J Al 
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• Case t = 1: Let X ~ / and U ~ ip, then X + cr?7 ~ K a f '. By applying Lemma 8 with if = 2/3, let 
fc' depending on f3 and ct(/3) such that for all a < a(/3), k'a\ lnai 1 / 2 < 7 and P(\U\ > k'\ lncrl 1 / 2 ) < 
cpa 2f} . Then, 



K a f(x)dx = P(X + aU G A" ) 

= P(X + G n |{/| < lncrl 1 / 2 ) + P(X + crU eA c a n \U\ > k'\ lnal 1 / 2 ) 

(33) 



and 



/ K a f(x)dx < P(X + aU e A c a n X e A aS C\\U\ < fc'l lncrl 1 / 2 ) 

Ja% 

+ P(X + oU G A% n X G n |C/| < jfc'| lncf/ 2 ) + P(\U\ > k'\ lncf/ 2 ) 
< P{X + aU E A c a D X e A a , s n\U\ < fc'| lncf/ 2 ) + P(X e A^) + c^cr 2 ^. (34) 

The second term in (34) can be shown to be bounded by a multiple of cr 2 ' 3 in the same manner as 
for t = for a < <r(/3). We now show that for a small enough, the first term in (34) is zero for every 
function / G H(/3, V). On the one hand, according to (4) there exists y G [X, X + aU] such that 



r—j—l 



lj{X + aU)= 



l]+u{X) 



u=0 



lr{y) 
- j) 



(auy 



If X G A a . s and \U\ < k'\ lncf/ 2 it yields 



r-j—l 



h(X + aU)\ < 

U— 

r-j 

E 

u=0 
r—j 

E 



< 



u\ 

lj+u(X) 



,1 



< 



< 



u=0 

r-j 



l]+u{X) 



aU\ u 
aU\ u + 
aU\ u + 



l r (y)-l r (X) 



- 3) 



(r-j)l 
H 

(u+j) 



au\ r - ] - 

L(X)\y-Xf- r \aU\ 
L(X)\aUf~ j 



l r {X) 



(r-j)\ 



\aU\ r ~ 



E^ QS ( cr l ln<7 l 1/2 ) {U+3) ( fffc 1 lnCT l 1/2 )" + ^-)j 5<B ( <T l ln<T l 1/2 ) \°k'\\nc 



u=0 



And thus for S small enough, \lj(X+aU)\ < 03 (a\ lncrl 1 / 2 ) 3 for all j G {1 . . . r}. Since X + cr£/ G 
this means that 

L(X + <rE/) > 03(<r| lncrl 1 / 2 )"' 3 . (35) 

On the other hand, let rj = max \zi\ where the Zi's are the roots of L. Suppose that deg(L) = q, for 
j = 1, . . . ,q, \L^\x)\/\ L(x)\ — > when \x\ tends to infinity. Consequently, since L does not vanish 
out of [—77, 77], there exists c > only depending on L such that if \x\ > 77+ 1, then |LW(a;)| < c|L(x)|. 
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If \X\ >r] + 1, then 



\L(X + aU)\ < L(X) + Y J 



LV)(X) 



(5<B (dlnd 1/2 



-P 



\aU\ j 

IHx^^U'l ind/ 



(36) 



j'=i 



J! 



/2 



< #8 (d. In d, 1/2 

< 25<8 fallncrl 1 / 2 



C/3< 



<HB (d lncr| 1/2 



-/3+1 



for a < a {13) where c(/3) can be chosen as a continuous function of /3. It then leads to a contradiction 
with (35) for 5 chosensmall enough and thus ¥(X + <jU £ A c a nX e A a ^r\\U\ < k'\\na\ 1/2 (l\X\ > 
T) + 1) = 0. Next, let L := max sup If \ x \ <V+ l , (36) implies that 

3=°--- r \x\< V +l 

\L(X + aU)\ < J*8 (a|lna| 1/2 )" a + cZj]^ (crfc'l ln^r 1 / 2 )' 



3=1 



(trllncrl 1 



/2 



C/3 



o-l lnd" 1/2 



25*8 ( crl lncr| 1/2 



for cr < cr(/3) where <t(/3) can be chosen as a continuous function of /?, which also gives a contradiction 
with (35). Thus ¥{X + aU E A c a DX e A a . s n \U\ < k'\ lnof/ 2 n |X| < r) + 1) = 0. Finally, for 
cr < ct(/?), P(X iuC/e^nle ^ n \U\ < k'\ lna\^ 2 ) = and (34) gives that 



K a f(x)dx = Op{a 20 ). 



• Case t > 2. The same method as before can be applied by assuming X ~ K* 1 / and U ~ tp. 
Similarly, J . K t CT f{x)dx can be decomposed into three terms as in (34). Two of them are Op(a 2 ^) 
and the remaining term is zero for S small enough. 

Proof of (29): 

• Case t — 0: According to Condition (7), 



/ f(x)dx < a H ^ 2 f y/ffr)da 



< a Hl/2 I VM^-I cxp(-x 2 ) dx 
Jr 



< a 2p MK 2 < 



since Hi > 4/3. 
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• Case t = 1: We have 

/ K a f{x)dx = f K a f{x)dx + [ K a f{x)dx 

where the second integral is less than J Ac K a f{x)dx which is 0^(<r 2 ' 3 ) for a < cr{0) according to (28). 

For 5 < 1 to be chosen further uniformly on H(f3,V), consider the set E a .s = {x £ R; f(x) > a SHl }. 
Let X ~ / and U ~ ip. By applying Lemma 8 as before with = 2/3, let k! depending on /3 and 
cr(/3) such that for all a < a(j3), k'a\ lna\ 1/2 < 7 and P(|C/| > k'\ hicrj 1 / 2 ) < C/3 cr 2/3 . Then, 

¥ (X + aU £ E c a f] A a ) 

W>(x + aU €E°nA a ; |C/| < fc'|lner| 1/2 ) + p(|£/| > In cr| 1/2 ) 

(37) 

p(x + aJ7e^nA CT ; \U\ < k'\ lncr| 1/2 ;Iei) + P(Ie^) 
+ p(|C/| > fc'llnal 1 / 2 ) 

t>(x + <rU€E°; \U\ < lncrl 1 / 2 ; X £ A CT n (38) 

+ P(Xe^)+P(Xe^) + c /?( j 2 ' 3 . (39) 

According to (28), the second term in (39) is Op{a 2 ^) for a < <r(/3). The first term in (39) can be 
bounded as previously for t = 0, leading to the condition 

5Hi > 4/3 

and thus we choose S £ (0, 1) to satisfy this last condition. It remains to control the probability 
given in (38). On the one hand, since X + aU £ E% and X £ E aj s, 

I In f(X + <tU) - In f(X) | > (1 - $)Hx \\na\. (40) 

On the other hand, since X £ A a and \U\ < k'\ lncrl 1 / 2 , 

r 

\\nf(X + aU)-\nf(X)\ < £ 

i=i 

< «Be fc ' + s B/c" 3 := rfi(/3,fc')- 

This is in contradiction with (40) for a < exp ^— ^i.^^ 1 and then (38) is zero for cr < a(f3,Hi) 

where a((3,Hi) can be chosen continuous. 

• Case i > 2. We follow the same proof as before: 

/ Kf(x)dx = f Kf(x)dx + f K.f(x)dx 



K a f(x)dx 



E^nA a 



< 



[ K a .f{x)dx < 



< 



+ L{X)\aU[ 
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where the second integral is less than J. c K^.f(x)dx which is Op(a 2 °) according to (28). Let X ~ / 
and Ui, . . . , Ut ~ ip. By applying Lemma 8 as before with H = 2/3, let kl depending on (3 and <r(/3) 
such that for all a < a(/3), k'a\ lnof/ 2 < 7 and P(\U\ > k'\ In of/ 2 ) < C/3 cr 2/3 . Then, 

f K t J(x)dx = P (X + aU x + ■ ■ ■ + aU t G E c a n Ar) 

t 

< P (x + crf/i + ••■ + ertf* G n A CT ; Vj |EJ,-| < fc'| ln^l 1 / 2 ) +X] P (l^l > fc 'l ln 

3=1 

< P far + ^ + . . . + aU t g n A a ; Vj |tr,| < fc'l lncrl 1 / 2 ; iein S CT>5 ) 
+ P(Ie^)+P(l6^)i Cj9 . 

According to (28), the second term in (42) is Op{a 2 ") for cr < cr(/3), as well as the first term if 
8H1 > 4/3. We thus choose S G (0, 1) to satisfy this last condition. As before, we check that the 
probability given in (41) is 0. On the one hand, since X + aU\ + • • • + aU t G E% and X G E'er, a, 

\\nf(X + aU 1 + --- + aU t )-\nf(X)\ > (1 - 5)JJi| ln^. (43) 

On the other hand, since lei and for all j, < k'\ lnof/ 2 , then 

r 

|ln/(X + ( rC/ 1 + --- + ( TC/ f )-ln/(X)| < ^ 

3=1 
r 

* E 

3 = 1 

< *Be tfe ' +*B(ifc')^. 
This in contradiction with (43) for a < cr(/3, Hi) and finally (41) is zero. 

Proof of E CT n A a C J CT , k = 

For /3 < 2, the inclusion is obvious since /o = / and thus J CT .fc = KL To prove the case /3 > 2, we 
show by induction on u G N, 1 < u < k that for every h G (0, 1), there exists a continuous function 
a(P,Hi,h) such that for all a < a((3,H 1 ,h), for all / G H(P,P) and all x£ E a D A a , 

fu{x)>(l-^f{x). (44) 

• Let u = 1 and defined by 

L (i) (x) = MZA (k'a\ lnal 1 / 2 )^ 1 + L(x) (k'a\ lnof Z 2 )^ . 

3=1 3 ' 

Then, for all x and all y G D x , we have 

ln/(x) - L^(x)\y - x\ < ln /(y) < ln/(x) + L«(a;)|j/ - a?|. (45) 



5> 



i(X) 



crfc'tllncrl 1 / 2 



L(X) (crtfc'|lncr 



1/2 
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Note that for all x in A a , L^\x) < Q3e fc 'cr 1 1 In cr| 1 / 2 . Starting from (45) and following the proof 
of Lemma 1 for the case j3 = 1, it yields for all x in A a and for all H ' > and all a < a ((3, H') 



(K a f)(x) = f(x) [l + LW(x)<TOp(l)\ + Op, H .{a H '). 
For every x G E a PI A a , taking ii 7 = Hi + 1 in (46) it yields 



(46) 



Six) 



= 2- 



/(*) 



1 - ^'(^(1) - a^0^ 1+1 (l). 

f{x) 



Next, 



< 1 since x G E^. Thus, for every ft, € (0,1), there exists a(j3, Hi,h) such that for all 
a < a(j3,H u h), for all / G U{P,V) and every a; G E a nA a , A(x) > (1 - 

• The previous point is sufficient for /3 < 4 since k = 1 in this case. We now also suppose that j3 > 4 
and thus that k > 2. Suppose that the integer 2 < u < k is such that (44) is true for the integer 
u—1. Let h G (0, 1), there exists a(fi,Hi,h) such that for all cr < a(/3,Hi,h). for all / G H{P,V) 
and every ir G PI f u _i(x) > (l — | ) /(x). Note that since 2u < {3 we find that for all x and 
all j/ G Ar, 



2u-l 



2u-l 



ln/(x)+]T ^M'-fy-^^'W <In/(y) < In M^ ( y_^ +(y _ x )2« L H (x) 



3=1 J 3=1 J 



.P-2u 



Thus for all x in A n 



with L(")(x) = E^n^ (fc'^l lntxl 1 / 2 )" 7 + L(x) (fc'allnal 1 / 2 ) 

< OSe^ cr _2 "| lncr|~ u . Following the proof of Lemma 1 for the case j3 = 2u, it yields for all 
x G A(j and for all H' > and all a < a(/3, 7?'), 



(K a f u _ 1 )(x) = f(x) \l + R^(x)O (a 2u )] + Of >tW (a H ') 



with fit") = a 2u+2 L^(x) + E -"i a,- \lj 



x)| i and we have supy-g^^^) sup xeB(rnAo 
^— Then, using (47) with H' = Hi + 1, it yields for all x £ A a n S CT 



(47) 

|(T 2M i?( u )(x)| < 



|ln<r| 



/(*) 



= l - 



K a f u _i(x) - 
RW(x)Op(a 2u ) + 



r i?i+i 



/(*) 



fu-l(x) 

m 



There exists a(fi,H u h) such that for all cr < a(P,Hi, h), R {u) (x)Op(a 2u ) + s j^yCp tHl+1 < h/2 

and the induction is complete. By choosing u — k and h = 1/2, it finally gives that E a PI ^4 CT C J a .k 
for cr < a(/3,Hi). 



Proof of (30): We have 



g k (x) dx 



f k {x)dx + 
1 



-f(x)dx 



f(x)-f k (x) 



dx 
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since J R f k (x)dx = 1 (see Lemma 12). Moreover, f k is a linear combination of K^f, t = 0, . . . , fc, 
according to Lemma 12. Thus it yields J R g k (x) dx = 1 + {<? 2 ) for cr < <j{f},H\) thanks to 

(28), (29) and that J c ak £\A%\JE c a for a < a(/3, H x ). 

Proof of (31): Let H > 0. According to Lemma 1 and (30), for all a < a(J3,H,Hi), for all 
x £ E a n j4o- C J a ,ki we have 



|tf«,M*) "/(*)! < Ug k (y)dy) \K a f k (x)-f(x)\+ Ug k (y)dy^ -1 

9k(y)dy] [ \\f{u)- f k {u)\ip a {x-u)du 



fix) 



< c p f(x)R f (x)cT rj +c p a H + Ug k {y)dy\ (0) 



where (0) = J JC {\f{u) — f k (u)}i/j (T (x — u)du. Let D x = {u £ E; \x — u\ < k'a\ ]na\i} such that 
k'a\ lncr|3 < <y. According to the third result of Lemma 12 and Lemma 8, 

< / \\f{u) - f k {u)\ i>„{x - u)du < 2 k + 2 ^ f ^ a (x - u)du < cpa H . 

Jj c a ^Dg I 1 J V 71 " J D< 

Next, if x £ A a n E a , there exists i > 1 such that for all u £ D Xl u £ A Gyt H S^j. This result 
can be proved by adapting some parts of the proof of (28) and (29). Moreover, by changing 03 
into fQ3, it can be shown that there exists a(/3) such that for all a < <r(/3), n E a ,t C J CT ,fe- 
Thus for cr small enough, f J0 nD c{\f{ u ) — fk(u)}ipa(x — u)du = 0. Finally, (^>) < cpa H and 

\K a h k {x) - f{x)\ < Cf,f(x)Rf(x)J + c' p a H . □ 
Proposition 2. 

Let (3 > and fc G N such that (3 £ (2k, 2k + 2]. There exists a positive constant cr(/3) sucfi. £/ia£ /or 
a// / e H{P,V) and all a < 

KL(f, K a h k ) = J f{x) In ( ^L ) = 0/»(^) 

where h k is defined by (27) and where a(/3) can be chosen as a continuous function of f3. 
Proof. Preliminary, we remark that if p and q are two densities and S is a set, then 

ph Jp)< f p P^l = f (P^ + 9jP^A = f (P^l + f {q _ p) 
\i) Js 9 Js q q Js q Js- 

since J s p = 1 — j s< . p, J s q = 1 — J" „ q and J s (p — q) = Jgc (q — p)- We use this inequality with the 
densities / and K a h k , and the sets A CT and E^, where is defined with Hi =4/3 + 1, to obtain 
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the following control of KL(/, K a h k ): 

f{x)ln( ) dx = [ f(x)ln( ) dx + / fix) In ( ^ ) dx 

\K a h k (x)J JA a nE a \K a hk{x)J Ja-ue- \K a h k {x) J 

JA a nE a K a h k {x) 

[K a h k (x) - f(x)]dx (49) 

A%ue% \K a h k (x)J 
• Control of (48): 

Let H > 0. According to Lemma 2 with i?i = 4(3 + 1, there exists a((3,H) > such that for all 
xe A a r\E a and for all a < a(/3,H), [K a h k {x) - f(x)f < [A 0tH f(x)R f {x)a + n PM a H ] 2 where 
Ap,H and f2 y g ! # are two constants. Moreover, according to Lemma 13, there exists a{(3) > such 
that for all a < cr(/3), 

k M x) > TT §^m 



with D = |^f. Thus for all a < a((3, H) A a((3), 



2A ( 3^fi / 3 i ^ 

D 

Then, 

A 2 



0. + Apa ap )<^ +H R f (x). 



[f(x) -KMx)] ^ < A^ 2>2/ W *,(*)»/(*)*> 

K a h k (x) D JA„^E a 

' n2f> ' H -{l + A^)a 2H ^ if3+ ^ [ f(x)dx 



D 

17 



(1 + A (3 a 2/3 )a' 3+ff - 4 ' 3 - 1 / R f (x)f(x)dx. (51) 



A a r\E a 



Thus the two integrals n£ Rf(x) 2 f(x)dx and n£ , Rf(x)f{x)dx have to be controlled. 
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The first integral can be decomposed into 



j R f (x) 2 f(x)dx = I 



A a r\E a 



a 3 \h( x )\ 1 f{x)dx 
4+i f L{xff{x)dx + Y j a 2 j f \lj(x)\¥ f(x)dx 

J A a r\E„ - = 1 J A a ^E a 

r f e 

2y^a r+ idj I \lj{x)\7 L(x)f{x)dx 

Mx)\f\l f (x)\Tf(x)dx. 



i=i 



+ E a 

3,3' =1 



A a r\E a 



Using the Holder inequality and Condition (6), for all j = 1, . . . , r, 



AvC]E a 



\l 3 (x)\Tf{x)dx< 



\lj( x )\ j f( x )dx 



2)3 
2..-1-I _- 



f{x)dx 



2/3 

and f ArrnE<7 L{x) 2 .f(x)dx < /„ L(x) 2+ i {x)f{x)dx [/ R f(x)dx] ^ < ( 
Cauchy-Schwarz inequality and (52), for all £ {1, ...,r},j ^ f, 



2/3 + e 2 o 



2/3 



(52) 



\l,(x)\f\l r (x)\?f(x)dx< 



' AfjV\Efj 

and for all j £ {1, . . . , r}, 



\ l j( x )\ 3 f( x )dx 



llj'Wl^ f( x )dx 



2)3 



A a V\E a 



\lj{x)\-L(x)f{x)dx < 



( r+l 



|ij(x)| j f{x)dx 



L{x) 2 f{x)dx 



_ 2/3 

< 



Finally, J R / (x) 2 /(x)dx < y£ a 3 J C^. 

For the second integral, 



Rf(x)f{x)dx 
A^nE,, 



A„nEv 



i r+1 L(x) +22aj\lj( 

3=1 



f{x)dx 



< a r+x J j L{x) 2 f(x)dxJ j f(x)dx + y~)ajJ / |^(a;)| i f{x)dxJ I 

y y jr . =1 y jr y jr 



r+l 

E g 

3=1 
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26 



A a r\E a 



[f{x)-K a h k {x)\< 
K a h k (x) 



' r+l 



dx < -f(i+w EM c ^ 



D 



3=1 

£(1 + A f3 a 2 ^)a 2H -^- 2 



D 

D 



' r+l 



(i+VV- 3M |E»i| c 

i=i 



1B+E 



By taking H = 5/3 + 1, it gives that there exists cr(/3) > such that for all a < <r(/3), 

[f{x) - K a h k {x)f 



L 



K a h k (x) 



-dx = O [a^) 



• Control of (49): 

According to Lemma 12, 

h k (x) 



g k (x)dx f k (x)tj„Jx) + -f{x)\jo k (x) 



thus 



fc+i 



i^M*) < 2^ ("j 1 ) Klf(x) + K„f(x). 
3=1 

According to (28) and (29) in Lemma 2 with i?i =4/3 + 1, there exists a(/3) > such that for all 



[K a h k (x) - f(x)}dx < / K a h k (x)dx + / f(x)dx 

A%UE% JA^UEZ JA%UEc 



fc+1 

< 2^2 < k+1 



3=1 
fc+1 



Kif(x)dx+ / K <J f{x)dx + / /(x)dx 

A=UB= JA'UB; JA'UE; 



< 



< 2 



2 E(T) / K2/(z)<fc+[2(fc + l) + l] / Ka.f(x)dx+ f K°J(x)dx 

j = 2 J A °a ■> A % J A % 

1/1 n n 

+ 2 E(T) / Kif(x)dx+[2(k+l) + l] / K a f(x)dx+ / K°J(x)dx 

j=2 JE% JE<L JE<L 

fc+1 

2 E(T) +2(^ + 2) 



3=2 
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• Control of (50): 

According to Lemma 13, for all a < ct(/3), K a hk{x) > 1+ ® 2 $ f(x) then 



1 - 


- Ap<& 




D 


1 - 






D 


1 - 


- Apa^ 


D 



In conclusion, there exists cr(/3) > such that for all er < cr(/3), KL(/, K a hk) = Op{a 2 ^). 



□ 



3.2. Proo/ o/ Theorem 2 



Proof. For the definition of E a , we choose H\ = 4(/3 + 1). Let hk be the restriction of on an 
interval [— \i a , fi a ], normalized in order to have a density function: 



/i fe (y)dy /i fc (x)l[_ Alff , Alff] (a;) 



where /v depends on cr and will be chosen further such that 

|Lt<r > (T. 



(53) 



Let e <G (0, 7r J / 2 ). According to Proposition 3 in Appendix A. 2, there exists a discrete distribution 
F on [— yUcr, /v] with at most 54^ (T cr~ 1 e 2 [— ln (y/ne) V 1] support points such that 



\\hk * Tpa ~ F * TpaWoo < — . 

cr 

Denoting p(x)dx = (jj^ ^ ] hk{x)dx^j F * ip a (x), it gives for all ie € R, 



(54) 



< 



hk{x)dx 



f\- u „M„] h k(y) d y 



Si-m.,^] hk{y)dy 

tpa(x) -F* 1p a {x) 



* ij} a (x) - F * Vo-C^) 



By applying Lemma 14 with p = |, it gives that for all cr < 1 — 2 1//fc and for all a; € 

k 



h k (x) < 4M 



4 

71 



HI) 
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and thus (/i fe l[- M „ l/lCT ]<0 * i>a[x) < 4M (jSj ^{^-). Now, we choose /i a := 2yin (± 
in order to obtain that || \h>k^-[—n a ,n a \ c ') *V'o-|| — §■ This last inequality together with (54) yields 

\\K„h k - < ^. 

We also define the function t := p + o- 6 ^ +5 -0 (T and the finite Gaussian mixture with density 

t{x) p(x) + a^+^vix) 



p(x) :-- 

Then we want to upper bound 



KL(/,p) = I f(x) In 
f(x)\n 
fix) In 



dx 



fix) 
K a hkix) 

fix) 



+ / /(x)ln 



K a hkix) 
K a hkix) 



dx+ I fix) In 
dx 



K a hkix) 



tix) 



dx 



/(or) In 



tix) 

K a h k jx) 
tix) 



dx+ / /0)ln 



tix) 
Pix) 



dx 



dx+ / /(x)ln 



tix) 
Pix) 



dx 



13 



14 



Control of | II | : According to Proposition 2, for all a < cr(/3), 



/(x)ln 



fix) 



Kuhkix) 



dx = O ia 30 ). 



• Control of 12 : According to Lemma 14. K a hkix) < AM y-^J f° r a small enough and since 

six) > ct 6/3 + 5 Vv(x), 



12 



< 



< 



( AM ( 



/(X) In 
fix)dx 



4 



a^+ 5 ip a ix) 



dx 



(6/3 + 4)| lncr| +ln [ AM ( -j= 



[ fix)%dx. 



For the second integral, 



E" V 



;fix)dx < <T" 



'■Vfjxjda 



< <j 2fi / x 2 y?Mipix)dx 

Jk 

< <j 2I3 V2Mtt- hu ia = AnVMa 213 . 
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Similarly, J Ec f(x)dx < a 2 ? +2 V2M and finally 



29 



12 



< \ In ( AM ( -j=j 1 + (6/3 + 4)| In cr | \ V2Ma 2p+2 + AnVMa 2 ^ '. 



Thus 



12 



• Control of 



13 



On the one hand, 

\K„h k (x)-t(x)\ < \K a h k {x)-p{x)\ + \p{x)-t{x)\ 
< Sea' 1 +a 6/3+5 ^ CT (x) 

3^-1+^+^-1/2. 



On the other hand, according to Lemma 13, for all x <G R and for all cr < a(/3), K a h k {x) > 
e x € 



6^(1+^^) /^)- Since * e then ^ft fc (x) > 6A/(1 ^^) ^ 4(/?+1) - Thus, t(x) > p(x) > 
K a h k (x) - Sea- 1 > ^ I{1+A ^J W) ~ Sea 1 . Finally, 



13 



< 



< 



< 



/(a) /W*) r *(*) d 



Sea- 1 



t(x) 



. ^6/3+4^.-1/2 



3ecr _ 



2(1+3^3 
3EO- 1 + Cr^+V" 1 ^ 



2(1+A (3 <7^ 



- 36CT- 1 



Let 5' := 1 + j^pjj and we set e := (7 *' 4 ^+ 1 )+ 1 . It yields 

(7T-V2 + 3)cr 6 ' 3 + 4 



13 



< 



T 4(/9 + l) 



30-6/3+4 



• Control of 



14 



Note that 44 



fr_ i /ift (y)dy + ct 6,3+5 < 1 + cr 6 ' 9 + 5 and thus 



14 



< 



/(x) In (1 + cr 6/?+5 ) cfe 



< cr 



6,3+5 



< a 



2/3 



Finally, we obtain that KL(/, p) = Op(a 2 P). Moreover, according to the choice of e, we have 
that 



\ \V^\Vs 



= 2 



In 



4M / 4 V 



\ \V^\VsJ 

Gp\ lncr|5 



-(6/3+4) 
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... 



where 



Gp = 2 Win 



V3 



(6/3 + 4). 



(55) 



Thus there exists <r(/3) continuous in j3 such that (53) is fulfilled for a < a(/3). Furthermore, the 
mixture p has k a components such that 



1„2 



k a < 54/i CT cr e 



1 V In 



1 



< G (3 |lncr|2 54(T" 1 e 2 
= GW -1 ! lntrl^. 



Tie 
1 V In 



1 



1 



•■6,8+5 



(56) 

□ 



4. Proof of the lower bound 



4- 1. Proof of Proposition 1 

Note that for every j, ipj is supported by 
a 



J, 



a 



a 

4~D 



UL , 



and thus the supports of the tpj, 1 < j < D arc disjoint. We also note that for all x G [— ^] c , fe{x) = 
uj(x) and for all x G [— f , f ], there exists an unique j G {1, . . . , D} such that /e(x) = 2£ + (2<?j — 
l)ipj(x) where <fj(x) = if x G Ij\Jj- The proof of Proposition 1 is decomposed into two lemmas. 

Lemma 3. Density function and monotonicity conditions. 

For all DeN* and all G {0, 1} D , the function fg defined by (9) is a positive density function 
such that for all x G [ — f , § ]■> fe{x) G [£, 3£]. This function fulfills also the following monotonicity 
conditions: 

1. Vx G [—a, a], /e(x) > £ and Vx G [— a, a] c , fe(x) < £. 

2. fg is nondecreasing on (— cxd,— a) and nonincreasing on (a, oo). 
5. Vx G R, /e(a;) < Mip(x) with M = M V 30r^cxp(o; 2 /4). 

Proof. For all x G [— § , % ] c , /e(x) = w(x) > since w is positive. Moreover, for all x G [— f , f ], 3!j G 



{1. 



Thus 



-D} such that x £ Ij. Then, 

= oj(x) + (29, - l)^-(x) = 2£ + (20, - l) Vj (x). 



\fe(x)-2£\ 



|(2^-1)| 



.4 



< 
< 
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since D~P < 1. Thus for all x G [— ^, §],/e(a;) G [£, 3£]. Finally, /g is a positive function on E. 
Moreover, 



fe(x)dx = / V (26»j - 1) / <pj(x)dx 

Js. Jr . =1 J/j 

= / f; (2^ - / *>(y)<fo 



because J R uj(x)dx = 1 and J R ip(y)dy = 0. Thus, /g is a density function. 

On (-co, —a), since fe(x) = u(x) and w is a nondecreasing function on (-co, —a), the function 
fg is a nondecreasing function on (-co, —a). Moreover, 

Vx < -a, < fe(-a) = oj(-a) = f. 

In the same way, the function /g is a nonincreasing function on (a, oo) and 

Vx > a, fg(x) < f e {a) = ui(a) = £. 

For all x G [—a, a], 

• if x G [—a, — f ), /e(x) = w(ir) > u(— a) = £ because u non-decreases and u>(— a) = £; 

• if x G (^,a], f$(x) = Lu(x) > Lj{a) = £ because to non-increases and ui(a) = £; 
. if x € [-§, f ], f e (x) G [£, 3£] thus f e (x) > £. 

For the last point, we have that for all a; G [— #, #] c , /e( x ) = < Mip(x). Moreover, for all 

.t G [-f,f], < 3£ < 3^cxp(a 2 /4)iP(x). Finally, for all x G R, /g(a:) < M(£, a, Af>(x) 

with M (f , a, M) := M V 30F<£ exp(a 2 /4). □ 

Lemma 4. Let /3 G [/3,/3]- For all G {0, l} 13 , i/ie function In /g is locally j3-H6lder: for all x,y 
such that \x — y\ < ^, 

| (In /g)M (x) - (In /g)M (y) | < L% [3, L, a)r\\x - yf~ r 

where L(/3, f3, L,a) does not depend on D. Moreover, there exists a constant C(/3, /3,C,a), which 
can be taken identical for every D, such that for any integer j = 1, . . . , r and for all D G N* , 



f \Qafo)W(x)\^fo{x)dx<C(l,p,C,a), 
Jr 



and 

/ \L{p, /3, L, a)\ — f e (x)dx < C{P, (3, C, a). 
Jr. 

If D is a positive even integer, for any integer j =0,...,r, |(ln/g)»(0)| < ln(2£). 
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:-!2 



Proof. Let j e {1,...,D} and 1 < t < r + 1. We start by upper bounding sup xeI .\(\nfg)^ t '(x)\. 
According to Lemma 17, for all x € Ij, 



(r) ,...,>)t)eH t u=0 



with 



u=0 



u=0 



For all u € {1, . . . ,t}, 



(r] , . . . , r] t ) G N t+1 ; ^ m lu = t, Vu = 2 4 " 1 

££)u-j3 



A \ a 



-I ll^ U) ||oo< 



Then, for all • ■ • , Vt) G S t , 



< £)Ei=i«'h.-^Ei=i'!»^=i% a -EU'«i» x |/ | 



I/O 



< ^-^i-^-l-^-t x | /e(a .)|^ 

since £)* =1 ""^u = t and 5Zl=i % = 2 * _1 - Since f e (x) € [£, 3£] and - 770 > 1, 



n /n< 



u=l 

t-1 



\fe(x)\ 



770-2' 



|(m/ e )W(x)| < £ |pfa>,...,»fc)| 
(ijo,...,i|t)eHi 

< • • • > ^)ie 2 " 1 -" o ^ t ^ (2 " 1 -' ,o) «- t r 

(»7o>— >»7t)6 H t 

< 2 IP^,..-^)^*-^" 1 -*^-*- 
(»7o,---,?7t)eHt 

Denoting £>(£) := card(S t ) and B(t) := max/ %) ... )>7 j)eH t |p(?70j ■ ■ • >Vt)\> it leads to 

S up xeI Mlnfef\x)\ < B(t)fl(t)X>*-^a-*. 



(57) 



We now use this preliminary result to prove that In fg is locally /3-Holder. Let (x, y) £ M 2 such 
that \x-y\< f . 

. lfx,ye [-f , f] c , 

|(ln/ e )M(x)-(m/ e )M(y)| = |(ln W )W(x)-(ln W )W(2/)| 

< ir!|x-y|' 9 - r . 

since lnw is locally f3- Holder with 7^ = j and a constant L. 
• If ye hf,f] c andiefy 

— If |x — y\ < -fp then x € Ij\Jj- Thus, ln/g(x) = lnw(x) and 

|(ln/ fl )W(x)-(lii/ fl )W(y)| = |(lna>)W(x)-(lnw)«(y)| 

< Lr!|x-j/|^- r . 
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If izj < \x-y\ < f , lnw(y) = In (2$) since x € [-3a/4, -a/2] U [a/2, 3a/4] thus if r> 1, 
|(ln - (ln/ e )W(y)| < || (In /„) M |U hQ/2 . Q / 2] + || (In w)M|U Jq/2i3q/4] 



< B(r)B(r)D r - l3 a- r 



AD 



— \x-y\ 

a ' 



0-r 



< g(r) r f (r) 4^a^ r! \x-yf~ 



and if r = 0, 



|(ln/ e )(x)-(ln/ e )(y)| < |ln (2£) - In (2£ + (20, - l) Vi (y))| 

< |-ln(l + (20- 1 (2^-l)^(j/))| 

< |(2C)- 1 (2^-1)^(2/)| 

< (2C)- 1 ^(4^)^^|x-y|^ 



0! 



For all x,y e [-a/2, a/2], 3\(j, f) e {1, . . . , D} 2 such that x e 7j and y e Ij'. 
-U\x-y\< 

* if f 7^ i) x £ -fjWj an< i 2/ G Ij'\Jj'> thus 



|(ln/ e )M(x)-(ln/ e )M(y)|=0. 



* if / = i, 



|(ln/ 9 )W(x)-(In/ e )M(i,)| < \x - yf~ r \x - y\ r+1 ~^ || ln/i r+1) || 00>[ _ a/aia/a] 

_ x B(r+ l)B(r + 1) D' r + 1 -' 9 



II 111 / 



r+1 



r!|z-y| /3 - r 



If A<|a>-y|<f:ifr = 0, 



|(ln/ e )(a:)-(lii/ e )(j/)| = 



< 



< 



/ l + (2Q- 1 (2^-l)^-(x) \ 

Vi + (20- 1 (2^-i)vi(y)y 

(20- 1 (2^-l)[^(o;)-^(y)] 



l + (20- 1 (2^-l)Vi(tf) 

2||(/?il|oo 



< 2£>-' 3 (4D)' 3 a-' 3 |x-y|' : ' 

< 2^a~P\x - yf = 24^^ g(1) Q ^ (1) 0!|x - yf 
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and if r > 1 

|(ln/ e )W(x)-(ln/ e )W(y)| < 2||(ln/ fl )W|| 00 , [ _ a/2 , a/2] 

< 2B(r)B(r)— —[ — ) \x - yf~ r 

< 2 



a \ a 



Finally, for all (3 £ 0\, for all {x, y) £ R 2 such that \x-y\<%, 

|(ln - (ln/ e )W(2/)| < L(^J,a)r!| a; - 2/ |' 3 -'' 

with 

L(M,L,a):=LV max. f 2 f^W/H) " 



/?e[£j] y IAI ! V" 

According to (57), for any integer j e {l,...,r}, || (In || 00 ,[_ Q / 2 ,a/2] < B(j)B(j)a~ j thus it 
yields 

\{\n.fe) U) {x)fr 1 fg{x)dx < f \{\nu J )^\x)\^u{x)dx+[B{ ] )B(j)a-i}^ [ fg(x)dx 

J[-a/2,a/2] e J [-a/2, a/2] 

2/l + e 

< C + [B{j)B(j)aT>] 3 . 
Thus there exists a constant C(/3, /?, C, e, a) such that for any integer j £ {1, . . . , r}, 

|(ln/ e )«(x)|^/ e (x)dz<C + max [S(j)6(j)]^ < C(/?, /3, <7, e, a) 

l<j<r+l — 

and 

/ |L(^,/3,L,a)| 2+ f/ e ( a; )d a ; = |L(^J,Z,a)| 2+ * < C(§_J,C,i,a). 

JR 

The last point assumes that D is even, thus £ Id/2\Jd/2- Then, ln/e is equal to ln(2£) in a 
neighborhood of and for all j £{!,.,., r}, |(ln/ e )^(0)| =0. □ 

Lemmas 3 and 4 show that for any positive even integer D and for all (3 £ [/3,/3], J~(j3,D) C 
H (/3,V(§_,(3)) where 

V{P> P) = {f . li(20, ^(A ft A o). e, <?(£, /3, C, e, a), a, £, M (£, a, M)| . 
4. Proof o/ Theorem 3 

Lemma 5. Let 9, 8' £ {0, l} 15 . The Hellinger distance between two functions fe and fe> of J {ft, D) 
fulfills 

1. d%(f e Je,)<^D-W, 
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D 

2. Ve ± 9', d 2 H (f e ,fe') > £a(2A)~ 2 r5(0, 6>')£>~ (2/3+1) where 6(9,9') = £ is the Hamming 

distance between 9 and 6' . 

Proof. 

The Hellinger distance between fg and f$i can be decomposed as follows: 



d? H (fe,fe>) 



■a/2, a/2] 



D 



dx + l [ 




J ^ J[-q/2,q/2] c 






-i 2 



dx 



dx. 



Since the quantity under the brackets is equal to zero if 9j = 9j , it gives 



= |E / k-2v^(2^^) 



Note that (^f^j < 1 for all x G and H^IU = ^^IMIoo < £■ Then, 



since -^1 — y > 1 — y for all y G [0, 1]. Thus, 
4e-2^(20 2 -^(a;) 

< m- 1 



dx < 



fir 



-0 



.4 



1 a 



d.r 



< (40" 



-1 {ZD^Y ol 



since J R ip 2 (y)dy = 1. Finally, 



<P H (fe,fe>) < (40" 



i (ZD-f>X a 1 



A D2 



~ 8A 2 



since £(0,0') < £>• 
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For the lower bound, we have 



36 



VW-^) 2 =2^1-(^) 
since y/1 — y < 1 — \y for all y € [0, 1]. Thus, 



<2£ 



1 1 (<£M\ 
2\ 2£ J 



dx > 



> (20 



> (20 



V A J D 



dx 



l P 2 (y)dy 



V A ) D 



and finally 



dWe,fe>) > (20 _i U 



> ia{2A)- 2 D^ +1 ^5{e i e'). 



□ 



Lemma 6. Let 9,9' 6 {0,1}"°. T/ie Kullback-Leibler divergence between two functions fe and fe 
of J(J3, D) fulfills 

Proof. The Kullback-Leibler divergence between fe and f$i is given by 

fe(x) 



KL(/ fl ,/ e 



/e(a;)ln 



[-a/2,a/2] 



[-a/2,a/2] 



fe>(x) 
fe(x)\n 

fe(x)\n 



dx 

fe(x) 
/«'(*) 

/e'(^) 



da; 
g?x. 



■a/2,a/2] c 



lu(x) In 



w(x) 
w(x) 



dx 



Then for all x e [—a/2, a/2] and for all 9 S {0, 1} D , /e(x) € [£,3£] according to Lemma 3 thus 



CO, [-1,11 



< 3. According to Lemma 7.23 in Massart (2007), 



KL(/ fl> / 9 /)<2 



2 + In ( 


fe 











d 2 H (fe,fe>) ■ 
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37 



Lemma 3 gives that for all x € [—a/2, a/2], fe{x) <E [£, 3£] and furthermore, fg = fgi on [—a/2, a/2] c . 
Thus, 



KL(f e ,f e ,) < 2 



2 + In 

< 10 4(/ fl ,/ 9 /) 
5ga 
4A a 



sup 

[-a/2,a/2] 



d 2 H Ue,fe>) 



< ^^-2^ 



according to Lemma 5. 



□ 



Proof of Theorem 3. The proof consists of applying Corollary 1 given in Appendix B with the space 
J{j3,D), the Hellinger distance da, P = 2 and the finite subset C = {fg, 9 <G 8} where 8 is the 
subset of {0, 1} D provided by Lemma 16. Then, it has to be checked that 

n max KL(f», fg/) < k In 101 . 

According to Lemma 16, ln|0| > ~ and K>\. Moreover, KL(fe,fe>) < x$D~ 2fj and thus D is 
chosen such that 

4A 2 - 16 A 2 ~ 

Since 3£a < 1 then 20£a7iA~ 2 < < 7n and we finally choose D = min{2fc; k e W, (2fc) 2/m > 
7n}. It gives that for any estimator s, 



snpE s [d 2 H (fg,s)} > 2- 2 (1-k) 



min d H {fg,fe>) 



> 2" 2 (1 - n)£a(2A)- 2 D 

> 2~ 2 (1 - n)£a(2Ay 2 D 



-C2P+K 



"(2/3+1 



mm c 

D 



(1 — n)fa „ fi 9 o ._ , 23 

> ^ -P— 2~ 6 - 2f) {7n)-^+^ 

A 2 



according to Lemma 16. 



□ 



5. Proof of Theorem 4 

Under the hypotheses of Section 2.4, let P(/3,j3) be the parameter set given in Proposition 1. In 
order to prove Theorem 4, we start with the following lemma that makes the connection between 
the models <S m and the approximation result given in Theorem 2. 

Lemma 7. There exists a positive constant Cp g such that for all [3 € [/3, 0\ and for all s G 

KL(s,S m ) < Cgj A(m) /3 . 
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Proof. According to Theorem 2, the level <j(0) under which the approximation (8) is valid is a 

continuous function of 0. Thus we can define the positive constant <r(/3, 0) := inf <r(/3). Next, let 

- Pe[P,0] 



m o (0,0) :=inf{m>2; y/lfm} < <t(0,0)} 



and consider m > mo(/3,/3). Then Theorem 2 can be applied for a = \/X(m): for all e [0,0] 

3 

and for all s G % (0, V(0, 0)) , there exists a mixture p with less than GpX(m)~2 In -\/A(m) 

components, with means belonging to [— ft{m), ftim)] and with the same variance A(m) for each 
component such that 

KL(j»,p) < cp X{mf. 
Since Gp is a non decreasing function of 0, the number of components is less than 



Gg ( VAM) lnv^M 



Gi 



(km) 1 

m 



ln(^ (lnm)HI 



< m- 



< m 



G 



In < 



'/3 



111 I 



3 I In In m I 
1 + [ 

2 lnm 



according to the definition of y/X(m) and Condition (10). This shows that p <E S m and thus 
KL(s,6> m ) < A(m)^ for all m > mo(0,0). Since is continuous in 0, there exists Cg a > such 

that for all G [0, 0], for all s G H (/3, V{0, 0)) , and for all m > m (/3, /3), 



KL(s,<S m ) <c^j[AHf. 



(58) 



It remains to show the same result for m < mo(0, 0) ■ let t m be a mixture of 5 TO , for all <E [0, 0] 
and for all s G U (0,V(0,0)), 



KL(s,S m ) < KL(s,t m ) 



< 



Mtp(x) In 



/MiP{x) 
\ t m (x) 



< +oo. 



Then it can be easily shown that (58) is valid for all m > 1 by changing the constant c 



□ 



Proof of Theorem 4- In order to upper bound the right-hand side of the oracle inequality (3), we 
first control the constant A defined by (2) that depends on the parameters of the Gaussian mixture 
model S m : 



A 2 < 4 I ln(67re 2 ) + it + In p,(m)» 



In 



/l44A(m) 
V AM 



For the third term, we note that 
In p(m)\ 



ciX(m) 



ciA(m) / 

' 4(G^) 2 |lnA(m)| 
ci A(m) 



In 



< cp ln(m) 
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since \/X(m) := ^(lnm)5. For the last term, 



In 



/l44A(m) 



= In 



144 m 2 A 



< Cp ln(m) 



and thus A 2 is upper bounded by Cp ln(m). For the observation of a n sample, the model collection 
is indexed by A4 n = {2, . . . , n} and then m < n. Thus for all m £ M. n , 



pen(m) = k 



3m — 1 



l + 2.4 2 + ln 



1 A 



D(m) 



A 2 



111 



< ca — [lnn + lnm] 



H n 
in 

< eg — ln(n). 

p n 

According to Lemma 7 and the definition of A(m), the oracle inequality is upper bounded by 

1 



E[d£(a,a A )] < C inf 

m£M r 



KL(s, S m ) + pen(m) + 



< Ca s inf 



(lnm) 3 " Inn 



Let m r 



inf |? 



< m^f |. Note that if m„ = 2, then E [d£ (s, s^)] < 



m > 2 ; to G 

™ and this case is completed. Assuming now that m n > 2, we want to check that m n < n. 
According to the definition of to„, 

K - 1) 2 ' 3 + 1 n 
[ln(m„ - 1)] 3/3 < Inn 



thus 



(to» - 1) 
ln(m„ - 1) 

t her wise. 
(m„ - 1) 



< 



In n 



where □ = 3/3 if (3 > 1 and □ = 2/3 + 1 otherwise. Next, since jfi > 1' 

n 

ln(m n — 1) Inn 

in all cases. Assuming that n > 3, it leads that m„ < n. Since m n £ M. n , 

[ln(m„ - 1)] 3/3 



E[d£(s,sm)] < 2c 



< 2c« <p2 2 ^ 



(m n -l) 2 P 
[lnm„] 3 ' 3 



■;?) 



2il 



< 2 2 ^ +1 c /J( 3[ln77i, i ] 



•V 



2/3 



In i 



(In to) 



-3/3 



< c^n (lnn) 2 ' 3 + 1 



□ 
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6. Conclusion 

In this paper, the penalized estimator Sm defined in Maugis and Michel (2009) is shown to be 
adaptive to the regularity on some density classes Hp which elements are univariate densities whose 
logarithm is locally /3-H61der. To prove this result, the approximation result given in Kruijer et al. 
(2010) has been adapted to control the bias term between our Gaussian mixture models and the 
density classes Hp. A lower bound for the minimax risk on the density classes Hp has also been 
stated to finally prove that our estimator reaches the minimax rate. 

In Maugis and Michel (2009), a Gaussian mixture estimator, fulfilling an oracle inequality as (3), 
is proposed in the context of multivariate data clustering. In a future work, it would be interesting 
to extend our adaptive result to this multivariate case. This requires to state an approximation 
result as Theorem 2 on multivariate density classes which have to be determined, that is obviously 
a technical task. 
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Appendix A: Appendices for the approximation result 
A.l. Gaussian kernel properties 

Lemma 8. Let ipr p \(x) = C p e~' a: ' P for all reals x where C p denotes the normalizing constant 



2r(i + h) 



Given a positive integer u, let f u ,p be the u-fold convolution of -0( p ) . Then, for any 



t > and for all H > 0, there exists a number kl — k'(p, t, u, H) such that for all a < 1, 



J 

J\x 



'\x\>k'\ ln<T|Vp 

Furthermore, k' is a continuous function of H . 

The reader is referred to Lemma 10 in Kruijer et al. (2010) for the proof of Lemma 8. Next 
lemma is a technical result used in Lemma 1 to prove the general case /? > 2. 

Lemma 9. For all positive integer u and for all integer k > u, 

D-d* ( k V) = o 

3=1 

where Vj t h is the h-th moment of the j-fold convolution of the Gaussian kernel ip. 

Proof. Let u = 1 and k E K* . For all j G {1, . . . , k + 1}, let [X\, . . . , Xj) be a sample with density 
ijj. Then 

i/,- a = E[(X X + . . . , X,) 2 } = V , 21 . E^ 1 ] . . . E[X*] = jE[X*] = jv ia 
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since the odd moments of if) are equal to zero. Thus, 

fc+1 fc I 1 fe 

( fe r) = E(-!) J (*f ) w = ( fc + ik> E(-!) j 0) = °- 

3=1 3=1 3=0 

We assume now that the result is true until rank u — 1. Let k > u and note that 



fc+i 

j=i 



■fc+i 



E(-!) 4 ( 



t 1 ) 



t=l 



fc+1 

^i,2u + E 

3=2 



fc+1 



E(-d 4 ( 



t 1 ) 



t=j 



{Vj,2u — ^j-l^u)- 



Moreover, v jj2u = ( 2 h u )E[(*i + ■ ■ ■ + X i _i) 2u - h ]E[Jfj l ] = Ep = o (1?) ^-i,2(„-p)^i,2p with the 

convention that i>h.Q = 1. Thus, 



EWm^ 

3=1 



fc+i 

^i,2« + E 



t=l J 3=2 

fc+1 fc+1 fc+1 

I>iW)+EE(-iW) 

t=l 3=2 t=3 

M-l ffc+1 fc+1 

E ©9 ^ E B- 1 )' (t 1 ) 

p=l I j=2 t=j 



Ec-W 1 ) 

t=3 



E (^p) I/ J-l,2(«-p) 1/ l,2p 

v.p=i y 



^l,2u 



fj-l,2(u-p) 



It can be checked that the term inside the brackets of (59) is null. For (60), noting that 



fc+i 



E (-W 1 ) 
t=i+i 



we have that for all 1 < p < u — 1 , 



E [(A) + + 



fc+i 



*=3 + l 



(-i) fe - (-ly (*) + (-i) fc+1 = -{-iy (*) 



(59) 
(60) 



fc+i 

E 

3=2 



fc+i 



Et- 1 )' TO 



t=3" 



fj-l,2(u-p) 



-E^ 1 )' (3')^2(»-p)=0 



3 = 1 



fc+1 



according to the induction assumption. Finally, ^2 ( — 1) J ( j ^3,2-u = 0. 

3=1 



□ 



A. 2. Measure discretization 

The following result is adapted from Lemma 2 in Ghosal and van der Vaart (2007). It allows us to 
approximate a general Gaussian mixture by a finite Gaussian mixture with a limited number of 
components. 
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Proposition 3. 

Let F be a probability measure on [—a, a] and a > such that a < a. Let e £ (0,tt~~%). Then there 



exists a discrete distribution F 1 on [—a, a] with at most 54aa 1 e 2 
such that 

\\F*^-F' *^ a \\ 00 < — • 



1 V In 



l 



support points 



Proof. The interval [—a, a] can be partitioned into k = [^r\ disjoint consecutive subintervals 
Ii, . . . of length a and a final subinterval Ik+i of length I < a: Ii = [a,, dj + <r[, i = 1, . . . , fc and 

-ffc+l = [Ofc+l , Ofc+i +/]. 

fc+i 

We decompose F on this partition F =^ F(Ii)Fi where each F; is a probability measure con- 

i=i 

fc+i 

centrated on Li. Then, F * ip a (x) = F(F)(Fi * ip a )(x). Let Zi be a random variable distributed 

i=i 

according to Fj, and let Gi be the law of Wi = {Zi — ai)/a. Thus Gi is a probability measure on 
[0, 1] for i = 1, . . . , k and on [0, l/a] c [0, 1] for i = k + 1. Lemma 10 is applied for each measure Gi 



and with D = In 



l 



We obtain discrete distributions G[ such that ||Gj * V' — G' ; * V'lloo < 2e. 



Let F/ be the law of a* + aW{ if W/ has law G\ and set F' = £i=i F(Ii)F(. We have 



Fi*^(a;) = E[^ (T (a;-Z i )] =E 



and *V * ^(x) = ±G< * tp (2=?*). Thus 



|F*t/v(z)-F/*Vv(z)| = - 



Then 



|f *V,x(aO-F'*lMaO| 



= E 



- G' * ^ 



fc+i 



- W,; 



= -Gi * V 
a 



< i||Gi*V-G<*VIU < -• 

cr er 



E^(Wi*Vv(aO-F/*Vv(z)] 



i=l 
fe+1 

< -E^ 



i=l 



Thus ||F * ipc, — F' * ip a \\ oo < — and the number of support points of the discrete distribution F' 
is upper bounded by 



fe+i 
i=i 



1 V In 



'7T£ 



-1/2' 



e 2 In 



(A + 1)18 



54acr~ 1 e 2 



1 V In 



IV In 



1 



1 



e 2 In 



□ 

The following lemma is an adaptation of Lemma 3.1 in Ghosal and van der Vaart (2001). For 
this lemma, one introduces the inverse function of ip a (-) defined by tp~ 1 {y) = o\f — ln(-y/7ry) on 

(0,7T-4]. 
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Lemma 10. Let F be a probability measure on [0, B\. Let e G (0,7r ') and let D be a positive 
constant such that B < Dip^ 1 (e). Then there exists a discrete distribution F 1 on [0, B] with at most 

18(1 V D) 2 e 2 In ^-^J support points such that 

\\F*i>-F' *tP\\ 00 < 2e. 

Proof. Let xo be a positive constant which can be calibrated. 
• Case 1: Suppose that |x| > xq. Then, 

\F*ip{x)-F' *ijj{x)\ < / ip(x -u)d{\F - F'\(u)) 



< 



exp[-\x - u\ 2 ]d(\F - F'\(u)). 



If x > 2B then x - B > fi. Thus, for all \x\ > x and \u\ < B, \x - u\ > {x - B) 2 > ^f- 



and 



IF * ifitx) — F' * tp(x)\ < — 1= / exp[-\x-u\ 2 ]d(\F - F'\{u)) < 4= exp 

V 7r Jo V 71" 



,•2 1 



If e < 7r 2, we choose xo such that exp 



< eV^ ^ - T o > 2 v /^Tn(V^£) = 27/T 1 (e) 



|oo,[-a:o,2;ol 



< 2e. 



Finally if x = 2max(B,ip- 1 (e)), then ||F * ^ - F' * ' 
Case 2: Suppose that |x| < x . By Taylor's expansion of e y and fc! > k k e~ k , we have for any 
y < 0, k > 1, 

fc-i 
i=o 

We use this inequality with y = — x 2 thus 



< 



< 



ip(x) - 
Then, it leads to 

\F* i>(x) - F' * ip(x)\ < 



77? 51 



; 2 J(-lp 



'7T ' 7! 
J=0 J 



< 



k 



1 /ex 



„b ^ k—i 

Jo 



(x-u) 2 ^-iy 



d(F-F')(u) 



^ k-l 

V 3=0 



{x-u) 2 3{-iy 



d(F-F')(u) 



(61) 
(62) 



The term (62) can be written 
B 1 fe_1 {x~u) 2 H-iy 



-y 

3=0 



d(F - F'){u) 



j=0 t=o Jo 
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According to Lemma A.l in Ghosal and van der Vaart (2001), there exists a discrete distri- 
bution F' with at most 2k — 1 support points such that u l dF' (u) = vtdFiu) for all 
1 < t < 2k — 2. Finally, considering this discrete distribution F', we obtain that (62) is null. 
For the term (61), 



if)(x — u) — 



1 fe-i 



d(F-F')(u) 



j=o 



< 



e(x — u)' 



d{F+F'){u). 



Since \x\ < x and < u < B, \x - u\ < \x\ + \u\ < x + B < ^f 2 -, we obtain that 



B 1 fe(x-u) 2 " 



d(F + F')(u) < -j= 



r\ 1 \ & 



% V 4fc 

Moreover, since x = 2max(B ) ^)- 1 (e)) and B < Dt]r 1 {e), x < 2(1 V £>)V> _1 (e) = 2(1 V 



9exn x 



y/n \ 4k 



< 



< 



9e(l V Df 



hi 



1 



: exp 



-k I In 



TVS 

9e(lVD) 



-In 



hi 



'TT£ 



We have that £ < n 2 < l and we choose fc such that k > In Q) and In 



VD) 



In 



In 



l 



> 1. This is the case if fc = 9(1 V D) 2 e 2 hW^M. Finally, the term (61) is 



upper bounded by 2ir <2e and ||-F * ip — F' * ip\\oc,,[-x ,x ] < 2e. 



□ 



A. 3. Technical results for f, fk, Qk, hk and their convolutions 

The following lemma allows to bound the derivative functions of In /. It is based on the smoothness 
assumptions (4) and (5) and is used in the proof of Lemma 1. 



Lemma 11. 

For all j G 



For all j £ {0, . . . , r} and for all n G Z, there exists a constant < l~^ n < oo such that for all 



sup 

2/e[ri7,(n+l)7] 



(In /)«(<,) 



Proof. We first prove Lemma 11 on [—7,7]. For all j € {l,...,r}, all / <G H(/3,V) and all y € 
[—7,7], there exists y £ [—\y\, \y\] such that 



Qn f)( J+tl )(0) y 7 '"- 7 ' 

(in/) (j) (y) = E , y u ' 

u=0 



(r-i)! 



(ln/)M(y)-(ln/)M(0) 
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Thus, 



r ~ 3 I (In f)( j+u U())\ rl 

|(ln/)«(y)| < E M , ' l^r + ^— -yylyr^(0)l^" r 



< 



u=0 

4-^ u! (r- j)! 



M=0 



and Lemma 11 is proved for n = —1 and n = 0. Now, assume that Lemma 11 is valid for n— 1 > 0. 
Then, proceeding as before, for all j £ {1, ...,r} and all y g [717, (n + 1)7], there exists y £ 
[717, (n + 1)7] such that 

On/)«(y) = " E 1 ^"T' - "7)" + [dn /)<-> (y) - (In /)(-> (n 7 ) 



and thus 



(r-i)! 

u=0 v J/ 



|(ln/)«(2/)| < E l(ln/) T (n7)| ^ + ^7)!^ L ^) 



u=0 

/ + 

u! (r - j) 



7 

j+u.n— ] 

u=0 

Finally, Lemma 11 is proved for all n £ N and a similar proof gives this result for all n £ Z\N. □ 
Lemma 12. Let f = f and Vfc £ W, f k +i = f- A a f k with A a f k = K a f k - f k . 

1. For all x £ R, f k (x) = £ 

2. For all k £ N, / R f k (x)dx = 1. 

3. For alli£N and for all x £ R, K*f(x) < ^ and i/ms |/ fc (x)| < (2 k+1 - 1)^. 

Proof. The first result is trivial for k = 0. For fc = 1, we remark that fi(x) — f(x) — A a f(x) = 
2f(x) — K a f(x). Then recursively, we have 

f k+1 (x) = f(x)-K a f k (x)+f k (x) 

= m- E (iii) (-ifK^m+ e (in) (-^Km 

i=0 i=0 
fc+1 fc 

= /(*)+ E (T) e iUl) (-^fKf(x) 

j=l i=0 

= (fri) (-i) fc+1 At +1 /(-)+ E + (fti)] 
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Since (*+*) + (£J) = ()XD and = (^), we have 



fe+i 



f k+1 ( X ) =J2 pi-iw/w. 

i=0 
fc 

Consequently, for all fc <E N, J K fk(x)dx = J] (i+f) ( — I) 4 L K l a f{x)dx. Moreover, it can be easily 

i=o 

proved by induction that for all nonnegative integer i, J R Klf(x)dx = 1. Thus, 

I f k (x)dx =J2 (in) (-i) 4 = - E (t 1 ) + = c 1 - x ) fe+1 + 1 = L 

t=0 j=0 

For the third result, according to Condition (7), f(x) < Mtp(x) < ^S=. And by induction, 

K l a f(x) = / ^-VWitx-^^-p / i> a {x - u)du < —=. 

Jr v 71 " Jr v 7r 



& 



Finally, |/ fc (aj)| <£ {1+1) Kf(x) < (2 fc +> - 1)^4. □ 



1=0 v 

Lemma 13. Let /3 > and t; £ N smc/i i/ia£ /? G (2fc, 2fe + 2]. Let f be a density function belonging 
to H(P,V) where V = {7, l + , L, e, C, a, £, M}. 

1. Let a > such that if Y is distributed from a centered Gaussian density with variance a 2 , 
then P(0 < Y < 2a) = |. For all a < a, 

K a f(x) > f^/(*)- (63) 

2. There exists tr(/3) > and Ap > such that for all a < a(/3), 

K ° h ^ > Q M { lf A ^) fix) - 

Furthermore, <r(/3) can be chosen as a continuous function of (3. 

Remarks 7. The first result of Lemma 13 is based on the monotonicity assumption on f . It comes 
from Remark 3 of Ghosal et al. (1999). In the second result, the constants <r(/3) and Ap are due to 
the result (30) in Lemma 2. 

Proof. For the first point, let a < a and Z be a standard centered Gaussian random variable. 
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• If x <G [—a, a], 

K*f(x) > / f(u)ip a (x-u)du 



—a 



fx — a „ \ „/ „ 2a \ „ / a; + a „ 2a 

<i p f <z<oj+pIo<z< — )- p {—— <z< — 

,1 „ / „ a — x\ „ f x + a 

> i\- + p (o<z< )-p 



a 





a 






t)) 


\ 







> 



According to Condition (7), f(x) < Mip(x) < 7r~ 1/2 M for all x € E. Then for all x € [-a, a], 

K,f(x) > ^f,r. 



• If x > a, 

> /(*) / ' ^(u)du 

Jo 

„ . ( ( 2a\ /2a x + a 

> /» j P ( < Z < — J +P ( — < z < — ±— 

> f(x)P (o<Z<^- 

> i/w. 

In the same way, for all x < —a, K a f(x) > ^f(x). 

Finally, since Mn^ 1 / 2 > f, K a f(x) > ^§-f(x) for all x G M. 

For the second point, we take iJi = 4/3 + 1 as in Remark 6; let a < in order to have (32) 
and (63). Then, for all a < a(/3), J R gk{u)du < 1 + Apa 213 and since gk(x) > \f{x) for all x £ R, 



K a h k {x) = I hk(u)ip„(x - u)du 
Jr 

— f ^7 ~ i'cix ~ u)du 
Jm 2J ,a fc 

□ 

Lemma 14. Lei pe (0,1). -For a// xel, we /iave that 



imsart-generic ver. 2011/01/24 file: PreprintMaugisMichel.tex date: January 15, 2013 



Maugis and Michel/ Adaptive density estimation using finite Gaussian mixtures 48 



for allien and for all a < 1 - p 1 / 1 , K l a f(x) < M (-^\ ' ip(px). 
for all a < 1 - p 1 ^, 



1, , A _ / 4 x 



max \f k (x),g k (x), -h k (x) j < 2M \^=j i>(px). 

Proof. The control of K % a f can be proved by applying successively Lemma 15 to /, K a f, K^ 1 f 

with q\ ~ p k /' 1 and q 2 = for each step k. It finally gives that K l a f{x) < M (^^j V'C?' 3 ') f° r an 

x in R and for all a < 1 — p 1 / 1 . This control on K\f together with Lemma 12 give the control on 
//.. According to the definition of g k and previous results, 

9k{x) < 2AI (-^=\ ^(px)t Jak (x) + Mffi(x)lj lh (x) < 2M f-^j ^(px). 



□ 



Finally, h k (x) = g k (x)/ J R g k (y)dy < 2g k (x) < AM (^=) if>(px). 

Lemma 15. Let f be a positive application on R such that there exists M > and q\ G (0, 1] such 
that for all x G R, f(x) < Mi()(q 1 x). Then for all q 2 G (0, 1) and all a G (0, 1 - q 2 ), 

Vx G R, K a f{x) < ^=M^( qi q 2 x). 

Proof. Let a G (0, 1 - q%). For all x G R, 

K a f{x) = I f(u)ip<j{x-u)du 



M 

< 



— [ cxp [-q\{o-y - x) 2 } exp(-y 2 )dy 
71 Jr. 

AI f 

< — exp [-qfx 2 (l - a)] / cxp(-y 2 + qfy 2 a(l - a))dy 

71 Jr 

r <2 o on f ( o 1 n 9 \ j 

< —exp [~ qi q 2 x J / expl-y + -q x y j dy 



□ 



Appendix B: Appendices for the lower bound result 

The two following results are crucial for establishing the lower bound: The first one is the so-called 
Varshamov-Gilbert's lemma and the second one is a corollary of a lemma given in Birge (2005). 
They correspond to Lemma 4.7 and Corollary 2.19 in Massart (2007) respectively. 
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Lemma 16. Let {0, 1} D be equipped with Hamming distance S. Given a £ (0, 1), there exists some 
subset O of {0, 1}'° with the following properties 



5(8, 6') > {1 ~£> D for every (0, 9') €Q,6^6' 



In 101 > 



pD 



where p = (1 + a) ln(l + a) + (1 — a) ln(l — a). In particular p > \ when a = 5. 

Corollary 1. Let (S,d) be some pseudo-metric space, {P s , seS} be some statistical model. Let k 
denote an absolute constant (given in Corollary 2.18 of Massart, 2007). Then for any estimator 
s and any finite subset C of S such that max KL(P S , P t ) < reln|C|, the following lower bound holds 

s,teC 

for every p > 1 



sup E s [<F(s,s)] > 2~ p (l -k) 
sec 



min d(s,t) 

s,teC,s^t 



The following lemma, used to prove Proposition 1, gives an expression of the derivatives of the 
logarithm of a function. 

Lemma 17. Let i eW and let t be a strictly positive function, t € C . Then 



whe 



with 



1 

P i (x)= J2 p(Vo,-..,Vi)H[t (j) (x) 



3=0 



3=0 3=0 
and p(r]o, . . . ,r]i) 's are the polynomial coefficients. 

Proof. The result is trivial for i = 1. Assume that t is C l+1 and that the result is valid for the i-th 
derivative. Then 



_ t( x y pixy -T-H{x)'t{xy -Lpjjx) _ 



with 



= t(x) 



2*~ 



(»?o,---,i)i)e=i 



x>^n (<(*><»> 

j=Q v J u=0 



'1.1 



3=0 
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- ]T p( V0t ... t r H )2*-H(xyt(xf- 1 - 1 'il (t(x)^) VS . 

(170,— ,»)i)6 s < J =0 

Let fjj denotes the new power of the j-th derivative for j = . . . , i + 1. In the second sum, we 
have that rjo = 2 t ~ 1 — 1 + r)o, fji = r)i + 1, ?7j = % for all j = 2, and = thus 

i+l i i+l i 

E Vj Vj + 1 + — 1 = 2 l and E = E i?7j + 1 = i + 1. In the first sum, 

j=0 j=0 j=0 j=Q 

• if 3 < i- fjo = + rj , % = % - 1, fjj+i = Vj+i + 1, Vu G {1, . . . , + 1}, fju = Vu and 

i+l z i+l i 

Vi+i = thus Vu = J2 Vu + 1 + -1 = 2' and J] ufj„ = J2 u Vu +j + l-j=i + l. 

u=0 u=0 u=0 u=0 

i+l i 

• if 3 = i- fjo = 2 J ~ 1 + no, fji = r] t - 1, f) i+1 = 1, Vu G {1, . . . , i - 1}, fj u = r) u thus Vu=J2 

u=0 u=0 

i+l i 

T] u + 1 + 2 i_1 -l = 2 l and £ U7 ?u = E Wfo + i + 1 - i = i + 1. 

u=0 u=0 

□ 
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